)]}' {"/PATCHSET_LEVEL":[{"author":{"_account_id":15343,"name":"Tim Burke","email":"tburke@nvidia.com","username":"tburke"},"change_message_id":"86b666002ea9c6a5a4cc1abf0be68dfe0a9e928a","unresolved":false,"context_lines":[],"source_content_type":"","patch_set":7,"id":"f8a976ef_f0c45726","updated":"2021-10-13 00:26:02.000000000","message":"I\u0027ve got a diff locally to add a\n\n _handle_last_primary()\n\nhelper that does the HEAD and maybe raises SsyncAbortDataFile, but tests need some working over before I\u0027m ready to submit.\n\nDo any tests explicitly exercise using a last_primary value?","commit_id":"82ce65538ade3a466d886ce1c79914a522d0b17e"}],"swift/obj/reconstructor.py":[{"author":{"_account_id":15343,"name":"Tim Burke","email":"tburke@nvidia.com","username":"tburke"},"change_message_id":"5dec81069d2eec5ee4a93829fd97fa19d56c47db","unresolved":true,"context_lines":[{"line_number":574,"context_line":" # we want to wait for all initial responses to come back as"},{"line_number":575,"context_line":" # if the missing frag is on the old primary we can save rebuilding"},{"line_number":576,"context_line":" # anything"},{"line_number":577,"context_line":" wait_for_all \u003d True"},{"line_number":578,"context_line":" # primary_node_count is the maximum number of nodes to consume in a"},{"line_number":579,"context_line":" # normal rebuild attempt when there is no quarantine candidate,"},{"line_number":580,"context_line":" # including the node to which we are rebuilding"}],"source_content_type":"text/x-python","patch_set":3,"id":"5da6b1f4_d41c7ab1","line":577,"updated":"2021-06-04 23:51:11.000000000","message":"I wonder if it\u0027d be better for us to just issue a HEAD to the last_part_node before doing anything. As is, we\u0027re still issuing a GET despite not being interested in the body, yeah?\n\nI think it would avoid both the need to switch between wait-for-all/wait-for-quorum and the bucketing changes -- we can just go ahead and wait for the HEAD; on 2xx with a newer or equal timestamp we bail, anything else (including timeouts and socket errors) we try to rebuild.\n\nAnd *then* it starts to feel like a pattern we could repeat for the replicator, at least when using ssync ;-)","commit_id":"e3fb14a33f2d59b8549ab6f42065e3436b22b1d2"},{"author":{"_account_id":7233,"name":"Matthew Oliver","email":"matt@oliver.net.au","username":"mattoliverau"},"change_message_id":"e913ca98cccf4b6eb7f3591c184ec3422d3fe767","unresolved":true,"context_lines":[{"line_number":574,"context_line":" # we want to wait for all initial responses to come back as"},{"line_number":575,"context_line":" # if the missing frag is on the old primary we can save rebuilding"},{"line_number":576,"context_line":" # anything"},{"line_number":577,"context_line":" wait_for_all \u003d True"},{"line_number":578,"context_line":" # primary_node_count is the maximum number of nodes to consume in a"},{"line_number":579,"context_line":" # normal rebuild attempt when there is no quarantine candidate,"},{"line_number":580,"context_line":" # including the node to which we are rebuilding"}],"source_content_type":"text/x-python","patch_set":3,"id":"a3d61b74_b8aa60fc","line":577,"in_reply_to":"5da6b1f4_d41c7ab1","updated":"2021-06-07 00:33:28.000000000","message":"What a great idea!! Good thinking.","commit_id":"e3fb14a33f2d59b8549ab6f42065e3436b22b1d2"},{"author":{"_account_id":15343,"name":"Tim Burke","email":"tburke@nvidia.com","username":"tburke"},"change_message_id":"86b666002ea9c6a5a4cc1abf0be68dfe0a9e928a","unresolved":true,"context_lines":[{"line_number":465,"context_line":" return None"},{"line_number":466,"context_line":" timestamp \u003d Timestamp(timestamp)"},{"line_number":467,"context_line":""},{"line_number":468,"context_line":" etag \u003d resp.headers.get(\u0027X-Object-Sysmeta-Ec-Etag\u0027)"},{"line_number":469,"context_line":" if not etag:"},{"line_number":470,"context_line":" self.logger.warning("},{"line_number":471,"context_line":" \u0027Invalid resp from %s, frag index %s (missing Etag)\u0027,"}],"source_content_type":"text/x-python","patch_set":7,"id":"dfd7e2b4_5a83e395","line":468,"updated":"2021-10-13 00:26:02.000000000","message":"Should we also be checking etags?","commit_id":"82ce65538ade3a466d886ce1c79914a522d0b17e"},{"author":{"_account_id":15343,"name":"Tim Burke","email":"tburke@nvidia.com","username":"tburke"},"change_message_id":"86b666002ea9c6a5a4cc1abf0be68dfe0a9e928a","unresolved":true,"context_lines":[{"line_number":490,"context_line":" buckets[Timestamp(durable_timestamp)].durable \u003d True"},{"line_number":491,"context_line":""},{"line_number":492,"context_line":" if resp_frag_index \u003d\u003d fi_to_rebuild:"},{"line_number":493,"context_line":" # With duplicated EC frags it\u0027s not unreasonable to find the"},{"line_number":494,"context_line":" # very fragment we\u0027re trying to rebuild exists on another primary"},{"line_number":495,"context_line":" # node or if we have access to the last primary node and it\u0027s"},{"line_number":496,"context_line":" # still there. In these cases mark it as found so we can deal"}],"source_content_type":"text/x-python","patch_set":7,"id":"e251ece7_c65f7ef1","line":493,"range":{"start_line":493,"start_character":19,"end_line":493,"end_character":38},"updated":"2021-10-13 00:26:02.000000000","message":"I\u0027m a little worried about these -- we may find the frag index (0 through ndata+nparity-1) we want to rebuild, but not the actual *node* index... and I don\u0027t think that node index X will do the fast thing when it wants to sync with node index (X mod (ndata + nparity)). Indeed, now it may not end up syncing at all.","commit_id":"82ce65538ade3a466d886ce1c79914a522d0b17e"},{"author":{"_account_id":15343,"name":"Tim Burke","email":"tburke@nvidia.com","username":"tburke"},"change_message_id":"86b666002ea9c6a5a4cc1abf0be68dfe0a9e928a","unresolved":true,"context_lines":[{"line_number":580,"context_line":" resp \u003d self._get_response(last_part_node, policy, partition, path,"},{"line_number":581,"context_line":" headers, \u0027HEAD\u0027)"},{"line_number":582,"context_line":" bucket \u003d self._handle_fragment_response("},{"line_number":583,"context_line":" last_part_node, policy, partition, fi_to_rebuild, path,"},{"line_number":584,"context_line":" buckets, error_responses, resp)"},{"line_number":585,"context_line":" if bucket and bucket.found_missing and \\"},{"line_number":586,"context_line":" bucket.is_useful(policy, local_timestamp):"}],"source_content_type":"text/x-python","patch_set":7,"id":"d34e6d42_36716464","line":583,"range":{"start_line":583,"start_character":16,"end_line":583,"end_character":30},"updated":"2021-10-13 00:26:02.000000000","message":"(If we keep this) I\u0027m pretty sure we want `node` here, not `last_part_node`. Otherwise, I see logging like\n\n Found existing frag #1 at 127.0.0.1:6010/sdb5/9/AUTH_test/c/o policy#2 while rebuilding to 127.0.0.1:6010/sdb5/9/AUTH_test/c/o policy#2\n\nwhich seems like some nonsense.","commit_id":"82ce65538ade3a466d886ce1c79914a522d0b17e"},{"author":{"_account_id":15343,"name":"Tim Burke","email":"tburke@nvidia.com","username":"tburke"},"change_message_id":"86b666002ea9c6a5a4cc1abf0be68dfe0a9e928a","unresolved":true,"context_lines":[{"line_number":583,"context_line":" last_part_node, policy, partition, fi_to_rebuild, path,"},{"line_number":584,"context_line":" buckets, error_responses, resp)"},{"line_number":585,"context_line":" if bucket and bucket.found_missing and \\"},{"line_number":586,"context_line":" bucket.is_useful(policy, local_timestamp):"},{"line_number":587,"context_line":" return bucket"},{"line_number":588,"context_line":" # nothing useful, let\u0027s clear the buckets up again"},{"line_number":589,"context_line":" buckets.clear()"}],"source_content_type":"text/x-python","patch_set":7,"id":"669418ae_6ebae6c0","line":586,"updated":"2021-10-13 00:26:02.000000000","message":"This feels weird -- bucket.is_useful() seems awkward for this, especially since we\u0027ve only got the one response (so we don\u0027t expect the len() check to ever work out) and the only other way to get a True out *also* checks bucket.found_missing.\n\nReally, I\u0027m not sure that response buckets are a good way to think about this at all -- we\u0027re firing off a single request and looking for one of two things:\n\n* the last primary is up and has the frag -- we should pass over this data file to give the last primary a chance to sync\n* last primary doesn\u0027t have the frag, or is unavailable -- proceed with reconstruction\n\nI think it\u0027d be better to be more explicit about that here.","commit_id":"82ce65538ade3a466d886ce1c79914a522d0b17e"},{"author":{"_account_id":15343,"name":"Tim Burke","email":"tburke@nvidia.com","username":"tburke"},"change_message_id":"86b666002ea9c6a5a4cc1abf0be68dfe0a9e928a","unresolved":true,"context_lines":[{"line_number":586,"context_line":" bucket.is_useful(policy, local_timestamp):"},{"line_number":587,"context_line":" return bucket"},{"line_number":588,"context_line":" # nothing useful, let\u0027s clear the buckets up again"},{"line_number":589,"context_line":" buckets.clear()"},{"line_number":590,"context_line":" # primary_node_count is the maximum number of nodes to consume in a"},{"line_number":591,"context_line":" # normal rebuild attempt when there is no quarantine candidate,"},{"line_number":592,"context_line":" # including the node to which we are rebuilding"}],"source_content_type":"text/x-python","patch_set":7,"id":"d5036290_f33642a6","line":589,"updated":"2021-10-13 00:26:02.000000000","message":"This definitely seems like a red flag indicating that buckets and _handle_fragment_response aren\u0027t the right way to think about this.","commit_id":"82ce65538ade3a466d886ce1c79914a522d0b17e"},{"author":{"_account_id":15343,"name":"Tim Burke","email":"tburke@nvidia.com","username":"tburke"},"change_message_id":"86b666002ea9c6a5a4cc1abf0be68dfe0a9e928a","unresolved":true,"context_lines":[{"line_number":692,"context_line":" if useful_bucket:"},{"line_number":693,"context_line":" if useful_bucket.found_missing:"},{"line_number":694,"context_line":" self.logger.increment(\u0027found.last_node\u0027)"},{"line_number":695,"context_line":" raise SsyncAbortDataFile(\u0027Letting handoff deal with it\u0027)"},{"line_number":696,"context_line":" frag_indexes \u003d list(useful_bucket.useful_responses.keys())"},{"line_number":697,"context_line":" self.logger.debug(\u0027Reconstruct frag #%s with frag indexes %s\u0027"},{"line_number":698,"context_line":" % (fi_to_rebuild, frag_indexes))"}],"source_content_type":"text/x-python","patch_set":7,"id":"66644185_d3248a8c","line":695,"updated":"2021-10-13 00:26:02.000000000","message":"Seems weird that we pull a bucket-of-one out of _make_fragment_requests just so we can raise here -- why not increment the logger and raise closer to where we detect the last primary having data, like up around L587?","commit_id":"82ce65538ade3a466d886ce1c79914a522d0b17e"}]}