)]}'
{"/COMMIT_MSG":[{"author":{"_account_id":7847,"name":"Alistair Coles","email":"alistairncoles@gmail.com","username":"acoles"},"change_message_id":"19f913b3e918cae4738da50909d04b7d3f30e0f0","unresolved":true,"context_lines":[{"line_number":9,"context_line":"Previously, queue-clean-up would use the same timestamp/offset as the"},{"line_number":10,"context_line":"tombstone written down in the old policy to clean up now-moved data."},{"line_number":11,"context_line":"This could lead to bad behaviors, as the reconciler-written tombstone"},{"line_number":12,"context_line":"could itself be enqueued to be reconciled, at the same timestamp. If"},{"line_number":13,"context_line":"that happened, replication would never bring the DB replicas to"},{"line_number":14,"context_line":"consistency, causing the reconciler to get different answers for"},{"line_number":15,"context_line":"whether there was work to do."}],"source_content_type":"text/x-gerrit-commit-message","patch_set":1,"id":"1a1a35bd_35078f7a","line":12,"range":{"start_line":12,"start_character":0,"end_line":12,"end_character":64},"updated":"2024-11-06 14:33:20.000000000","message":"IIUC this would *not* be enqueued i.e. the attempt to merge into the queue container would fail, because existing (the pop-queue tombstone) trumps new (the re-queued tombstone-to-be-moved) in the ``ContainerBroker.merge_items``.\n\nOr maybe there is opportunity for a race* between the pop-queue ts and the re-queued ts and each replica can end up in a different state that is never resolved consistently? \n\n(*If the replicator visits in the period between the reconciler writing the old-policy-index tombstone and then writing the pop-queue tombstone.)\n\nEither way, not good.","commit_id":"0505bbe407a394127c216696eb1cf999c050363b"},{"author":{"_account_id":15343,"name":"Tim Burke","email":"tburke@nvidia.com","username":"tburke"},"change_message_id":"40e6a964afcf6885958156d35ecd99e372499cfa","unresolved":true,"context_lines":[{"line_number":9,"context_line":"Previously, queue-clean-up would use the same timestamp/offset as the"},{"line_number":10,"context_line":"tombstone written down in the old policy to clean up now-moved data."},{"line_number":11,"context_line":"This could lead to bad behaviors, as the reconciler-written tombstone"},{"line_number":12,"context_line":"could itself be enqueued to be reconciled, at the same timestamp. If"},{"line_number":13,"context_line":"that happened, replication would never bring the DB replicas to"},{"line_number":14,"context_line":"consistency, causing the reconciler to get different answers for"},{"line_number":15,"context_line":"whether there was work to do."}],"source_content_type":"text/x-gerrit-commit-message","patch_set":1,"id":"80bded2b_480a0dd6","line":12,"range":{"start_line":12,"start_character":0,"end_line":12,"end_character":64},"in_reply_to":"1a1a35bd_35078f7a","updated":"2024-11-06 18:14:49.000000000","message":"\u003e Or maybe there is opportunity for a race* between the pop-queue ts and the re-queued ts and each replica can end up in a different state that is never resolved consistently?\n\nThat somehow seems to be the case. Maybe there\u0027s something to do with an update sitting in the `.pending` file? Regardless, it\u0027s not great to have multiple (different) things trying to be written to the same place (whether that\u0027s an object server or a container db) with the same timestamp.","commit_id":"0505bbe407a394127c216696eb1cf999c050363b"},{"author":{"_account_id":7847,"name":"Alistair Coles","email":"alistairncoles@gmail.com","username":"acoles"},"change_message_id":"19f913b3e918cae4738da50909d04b7d3f30e0f0","unresolved":true,"context_lines":[{"line_number":21,"context_line":""},{"line_number":22,"context_line":"- data/ts in old policy at t0 (as well as queue entry time to move it)"},{"line_number":23,"context_line":"- tombstone in old policy at t0_1"},{"line_number":24,"context_line":"- queue clean-up time at t0_2"},{"line_number":25,"context_line":"- moved data/ts in new policy at t0_3"},{"line_number":26,"context_line":""},{"line_number":27,"context_line":"Closes-Bug: #2028175"}],"source_content_type":"text/x-gerrit-commit-message","patch_set":1,"id":"092fa053_fc02a6a5","line":24,"updated":"2024-11-06 14:33:20.000000000","message":"ok, so queue cleanup now trumps any attempt to re-queue the tombstone in the old policy that the reconciler just wrote, which doesn\u0027t need to be reconciled.","commit_id":"0505bbe407a394127c216696eb1cf999c050363b"},{"author":{"_account_id":15343,"name":"Tim Burke","email":"tburke@nvidia.com","username":"tburke"},"change_message_id":"40e6a964afcf6885958156d35ecd99e372499cfa","unresolved":true,"context_lines":[{"line_number":21,"context_line":""},{"line_number":22,"context_line":"- data/ts in old policy at t0 (as well as queue entry time to move it)"},{"line_number":23,"context_line":"- tombstone in old policy at t0_1"},{"line_number":24,"context_line":"- queue clean-up time at t0_2"},{"line_number":25,"context_line":"- moved data/ts in new policy at t0_3"},{"line_number":26,"context_line":""},{"line_number":27,"context_line":"Closes-Bug: #2028175"}],"source_content_type":"text/x-gerrit-commit-message","patch_set":1,"id":"56c01993_0bcb6542","line":24,"in_reply_to":"092fa053_fc02a6a5","updated":"2024-11-06 18:14:49.000000000","message":"FWIW I\u0027ve got this nagging feeling that a double-movement can still cause issues:\n\n- there\u0027s data/ts in policy A at t0\n- we find that it should get moved to policy B, and make a queue entry for A@t0 to move it\n- reconciler acts on queue entry:\n  - tombstone in A@t0_1\n  - queue clean up for policy A@t0_2\n  - moved data/ts in policy B@t0_3\n- but not everything gets all updates!\n  - at least one container server gets record for ts in A@t0_1\n  - some queue DBs miss the clean-up A@t0_2\n  - *no* container server gets record for data/ts in B@t0_3 (yet)\n- we discover that **actually** everything should be in policy C\n- tombstone in A@t0_1 gets a queue entry (for policy A, also at t0_1) to move it\n- reconciler acts on queue entry\n  - tombstone in A@t0_2\n  - queue clean up for A@t0_3\n  - moved data/ts in C@t0_4\n- container servers finally find out about data/ts in B@t0_3\n- queue entry gets made for B@t0_3 to move it\n- reconciler acts on queue entry\n  - sees that t0_3 \u003c\u003d t0_4 so *doesn\u0027t move **anything***\n  - throw tombstone in B@t0_4\n  - clean queue entry for policy B@t0_5\n\nbut resolving *that* is out of scope for this change.","commit_id":"0505bbe407a394127c216696eb1cf999c050363b"},{"author":{"_account_id":7847,"name":"Alistair Coles","email":"alistairncoles@gmail.com","username":"acoles"},"change_message_id":"19f913b3e918cae4738da50909d04b7d3f30e0f0","unresolved":true,"context_lines":[{"line_number":22,"context_line":"- data/ts in old policy at t0 (as well as queue entry time to move it)"},{"line_number":23,"context_line":"- tombstone in old policy at t0_1"},{"line_number":24,"context_line":"- queue clean-up time at t0_2"},{"line_number":25,"context_line":"- moved data/ts in new policy at t0_3"},{"line_number":26,"context_line":""},{"line_number":27,"context_line":"Closes-Bug: #2028175"},{"line_number":28,"context_line":"Change-Id: Ib0dda0d338f48336d18d3d817a0c5994e201042e"}],"source_content_type":"text/x-gerrit-commit-message","patch_set":1,"id":"11699f2b_2915a075","line":25,"updated":"2024-11-06 14:33:20.000000000","message":"this necessarily is ahead of the queue clean up because it might reasonably get re-queued","commit_id":"0505bbe407a394127c216696eb1cf999c050363b"}],"/PATCHSET_LEVEL":[{"author":{"_account_id":15343,"name":"Tim Burke","email":"tburke@nvidia.com","username":"tburke"},"change_message_id":"bcf7bdf74adb9225d9681a9fd2fc0a2358e12e80","unresolved":false,"context_lines":[],"source_content_type":"","patch_set":1,"id":"197c2955_f2f2ce44","updated":"2024-11-04 16:42:36.000000000","message":"Ran the failing test a bunch over the weekend. Before this patch, I saw ~3% failures over the course of 1,000 runs; with this patch, no failures over 3,000+ runs.","commit_id":"0505bbe407a394127c216696eb1cf999c050363b"},{"author":{"_account_id":7233,"name":"Matthew Oliver","email":"matt@oliver.net.au","username":"mattoliverau"},"change_message_id":"17f58c3dbd3a26b1dbbfea8c9517952a178ecddf","unresolved":false,"context_lines":[],"source_content_type":"","patch_set":2,"id":"d61cf6ab_c4caac33","updated":"2024-11-11 04:32:10.000000000","message":"Just a question inline.. although I\u0027m probably just missing something obvious :)","commit_id":"3a43242e792898d1c51b5db05d79de9801f2d042"},{"author":{"_account_id":7847,"name":"Alistair Coles","email":"alistairncoles@gmail.com","username":"acoles"},"change_message_id":"e265d695c9d3c417125087c8bd5372500b72107f","unresolved":false,"context_lines":[],"source_content_type":"","patch_set":2,"id":"8e0a34a9_cb98e1e6","updated":"2024-11-06 14:34:36.000000000","message":"Pushed some more test assertions.\n\nThis change makes sense to me. Will be great to have the flakey probe test fixed.","commit_id":"3a43242e792898d1c51b5db05d79de9801f2d042"},{"author":{"_account_id":15343,"name":"Tim Burke","email":"tburke@nvidia.com","username":"tburke"},"change_message_id":"40e6a964afcf6885958156d35ecd99e372499cfa","unresolved":false,"context_lines":[],"source_content_type":"","patch_set":2,"id":"627cf1a1_0078464d","updated":"2024-11-06 18:14:49.000000000","message":"Thanks for the new assertions!","commit_id":"3a43242e792898d1c51b5db05d79de9801f2d042"},{"author":{"_account_id":7233,"name":"Matthew Oliver","email":"matt@oliver.net.au","username":"mattoliverau"},"change_message_id":"ee80f9e15276bfbbb8959b1be8f91894a0c99b3f","unresolved":false,"context_lines":[],"source_content_type":"","patch_set":2,"id":"57342bf7_e6cd134d","updated":"2024-11-13 04:37:46.000000000","message":"Yeah fair enough. This is defintely better then what we did have, and does address the issue we\u0027ve been seeing in the gate.\n\nAny further discussions around changes to things like leaving TS\u0027s can come later if it actaully makes sense. Let\u0027s not hold this up anymore.","commit_id":"3a43242e792898d1c51b5db05d79de9801f2d042"}],"swift/container/reconciler.py":[{"author":{"_account_id":7847,"name":"Alistair Coles","email":"alistairncoles@gmail.com","username":"acoles"},"change_message_id":"19f913b3e918cae4738da50909d04b7d3f30e0f0","unresolved":true,"context_lines":[{"line_number":435,"context_line":"        an object was manually re-enqued."},{"line_number":436,"context_line":"        \"\"\""},{"line_number":437,"context_line":"        q_path \u003d \u0027/%s/%s/%s\u0027 % (MISPLACED_OBJECTS_ACCOUNT, container, obj)"},{"line_number":438,"context_line":"        x_timestamp \u003d slightly_later_timestamp(max(q_record, q_ts), offset\u003d2)"},{"line_number":439,"context_line":"        self.stats_log(\u0027pop_queue\u0027, \u0027remove %r (%f) from the queue (%s)\u0027,"},{"line_number":440,"context_line":"                       q_path, q_ts, x_timestamp)"},{"line_number":441,"context_line":"        headers \u003d {\u0027X-Timestamp\u0027: x_timestamp}"}],"source_content_type":"text/x-python","patch_set":1,"id":"326b09c6_1d55e132","line":438,"range":{"start_line":438,"start_character":22,"end_line":438,"end_character":46},"updated":"2024-11-06 14:33:20.000000000","message":"I\u0027m not sure this helper is so useful now that only 1 of 4 calls is using the default offset - it might be more helpful to see the offset explicitly at each call site","commit_id":"0505bbe407a394127c216696eb1cf999c050363b"},{"author":{"_account_id":7847,"name":"Alistair Coles","email":"alistairncoles@gmail.com","username":"acoles"},"change_message_id":"19f913b3e918cae4738da50909d04b7d3f30e0f0","unresolved":true,"context_lines":[{"line_number":435,"context_line":"        an object was manually re-enqued."},{"line_number":436,"context_line":"        \"\"\""},{"line_number":437,"context_line":"        q_path \u003d \u0027/%s/%s/%s\u0027 % (MISPLACED_OBJECTS_ACCOUNT, container, obj)"},{"line_number":438,"context_line":"        x_timestamp \u003d slightly_later_timestamp(max(q_record, q_ts), offset\u003d2)"},{"line_number":439,"context_line":"        self.stats_log(\u0027pop_queue\u0027, \u0027remove %r (%f) from the queue (%s)\u0027,"},{"line_number":440,"context_line":"                       q_path, q_ts, x_timestamp)"},{"line_number":441,"context_line":"        headers \u003d {\u0027X-Timestamp\u0027: x_timestamp}"}],"source_content_type":"text/x-python","patch_set":1,"id":"173d86a8_27a63506","line":438,"range":{"start_line":438,"start_character":68,"end_line":438,"end_character":76},"updated":"2024-11-06 14:33:20.000000000","message":"this change doesn\u0027t seem to be covered in test_reconciler.py unit tests (obvs it is covered by the intermittent probe test!)","commit_id":"0505bbe407a394127c216696eb1cf999c050363b"},{"author":{"_account_id":7233,"name":"Matthew Oliver","email":"matt@oliver.net.au","username":"mattoliverau"},"change_message_id":"17f58c3dbd3a26b1dbbfea8c9517952a178ecddf","unresolved":true,"context_lines":[{"line_number":549,"context_line":"                           \u0027is newer than queue (%f)\u0027, path, dest_ts,"},{"line_number":550,"context_line":"                           container_policy_index, q_ts)"},{"line_number":551,"context_line":"            return self.throw_tombstones(account, container, obj, q_ts,"},{"line_number":552,"context_line":"                                         q_policy_index, path)"},{"line_number":553,"context_line":""},{"line_number":554,"context_line":"        # object is misplaced"},{"line_number":555,"context_line":"        self.stats_log(\u0027misplaced_object\u0027, \u0027%r (%f) in policy_index %s \u0027"}],"source_content_type":"text/x-python","patch_set":2,"id":"dc07f0be_a88a6aa4","line":552,"updated":"2024-11-11 04:32:10.000000000","message":"So is the point that a re-enqueued processed ts from old policy (t0-1 in your example) stops getting re-enqueued because it\u0027ll be blocked being enqueuded by the the t0-2 deleted queued (which is in the deleted state) in the merge_items in the container?\n\nIf it isn\u0027t, then whats stopping a 2nd or 3rd enqueued TS from being \u003c t0-3, but causing the 551 throw_tombstone call from just writing a new TS as offset (2 and then offset 3, until it wins)? I guess I need to look closer at the enqueing code. However, I wonder if in this part of the code, if the  `dest_ts \u003e\u003d q_ts` and the source is TS (q_op is a delete), why dont we just NOT write the TS with an incremented TS back to the source policy.. as it\u0027s already a TS, so we should be good to leave it and then dequeue the queue entry? And breaking the cycle.","commit_id":"3a43242e792898d1c51b5db05d79de9801f2d042"},{"author":{"_account_id":15343,"name":"Tim Burke","email":"tburke@nvidia.com","username":"tburke"},"change_message_id":"f5908dd2911507d5883d0ea33b21f52ee0d46295","unresolved":true,"context_lines":[{"line_number":549,"context_line":"                           \u0027is newer than queue (%f)\u0027, path, dest_ts,"},{"line_number":550,"context_line":"                           container_policy_index, q_ts)"},{"line_number":551,"context_line":"            return self.throw_tombstones(account, container, obj, q_ts,"},{"line_number":552,"context_line":"                                         q_policy_index, path)"},{"line_number":553,"context_line":""},{"line_number":554,"context_line":"        # object is misplaced"},{"line_number":555,"context_line":"        self.stats_log(\u0027misplaced_object\u0027, \u0027%r (%f) in policy_index %s \u0027"}],"source_content_type":"text/x-python","patch_set":2,"id":"718e7779_1f69b9d6","line":552,"in_reply_to":"dc07f0be_a88a6aa4","updated":"2024-11-13 00:42:20.000000000","message":"There\u0027s a reason I explicitly called out that my hypothetical was out of scope 😜\n\n- I\u0027ve never actually *seen* this failure mode\n- I don\u0027t feel like I\u0027ve got a good way to express my hypothetical in a probe test\n- I want to address the failure mode that I *have* seen; namely, having two containers with conflicting (and unmergeable) views of the reconciler queue.\n\nMaybe we could try to do something clever here? Drop the `acceptable_statuses` kwarg, catch the 404 and ignore it. But I\u0027d prefer to do that as a separate patch, *after* we\u0027ve stabilized the gate more.","commit_id":"3a43242e792898d1c51b5db05d79de9801f2d042"}]}
