)]}'
{"/COMMIT_MSG":[{"author":{"_account_id":1179,"name":"Clay Gerrard","email":"clay.gerrard@gmail.com","username":"clay-gerrard"},"change_message_id":"e185d56013ff56b5b2993fbb8c132f00b242e5a4","unresolved":true,"context_lines":[{"line_number":9,"context_line":"We\u0027ve seen this happen a couple of times in production where a root"},{"line_number":10,"context_line":"container suddenly thinks it\u0027s unsharded as it\u0027s own_shard_range is"},{"line_number":11,"context_line":"reset. The only way this can happen is if a new replica or handoff node"},{"line_number":12,"context_line":"places a container and a new own_shard_range is created and replicated"},{"line_number":13,"context_line":"to older sharded primaries."},{"line_number":14,"context_line":""},{"line_number":15,"context_line":"In the sharding life cycle, only never sharded containers can have an"}],"source_content_type":"text/x-gerrit-commit-message","patch_set":2,"id":"2dd70024_7b14f4e1","line":12,"updated":"2021-09-10 15:12:37.000000000","message":"yeah but how does *that* happen - when would an unsharded db replica decide to create a new own_shard_range\n\nin all my testing with rebalanced parts and handoffs the unsharded fresh db always picks up shard ranges from it\u0027s peers during replication","commit_id":"728b75a02e1a47b21b697542f0be6468a25100cf"},{"author":{"_account_id":7233,"name":"Matthew Oliver","email":"matt@oliver.net.au","username":"mattoliverau"},"change_message_id":"804036bf76fa35409678dda234b8a8a5816009b8","unresolved":true,"context_lines":[{"line_number":9,"context_line":"We\u0027ve seen this happen a couple of times in production where a root"},{"line_number":10,"context_line":"container suddenly thinks it\u0027s unsharded as it\u0027s own_shard_range is"},{"line_number":11,"context_line":"reset. The only way this can happen is if a new replica or handoff node"},{"line_number":12,"context_line":"places a container and a new own_shard_range is created and replicated"},{"line_number":13,"context_line":"to older sharded primaries."},{"line_number":14,"context_line":""},{"line_number":15,"context_line":"In the sharding life cycle, only never sharded containers can have an"}],"source_content_type":"text/x-gerrit-commit-message","patch_set":2,"id":"1c43bab8_e4eef8ee","line":12,"in_reply_to":"1828eb97_430be99b","updated":"2021-09-13 07:35:42.000000000","message":"sigh, I\u0027ve been playing with a probe test that is trying it hit different scenarios where we \"might\" pull a default own shard range:\n\n1. Hand off node gets a new container DB on PUT. Replication gets objs count, sharder triggers a defailt own_shard_range which has a newer TS. \n  - Turns out this doesn\u0027t happen because replication gets all shard ranges including the own_shard_range of the other root primariys.\n\n2, Rebalance happens, there is a new primary, it gets a PUT and enough objects before the it replicates with any others.\n  - This might cause a shard candidate but won\u0027t trigger a own_shard_range generation unless sharding is enabled.. in which case it\u0027ll get a new epoch anyway, not None. \n\nProblem is, we don\u0027t tend to write back a default own_shard_range in many places. except if it\u0027s enabling sharding. We don\u0027t even write back an own_shard_range much (in the root that is). And we can\u0027t get sharder to trigger anywhere it does without getting object counts big enough, and if it does then it\u0027ll get them from either the other primaries or will shard with a new epoch.\n\nIt could be in s-m-s-r enable code, but likewise to get there it needs to first be a sharding candidate.\n\nI\u0027m going to keep probing. This time will look closer at where we are using the sharding state checking and more own_shard_range paths. Then instead of trying to find a situation where a \"new\" container could infect existing, see if I can find a place or way we can fail to return an own_shard_range so it\u0027ll be defaulted.\n\nSorry for the brain dump status update, just trying to find the edge case.","commit_id":"728b75a02e1a47b21b697542f0be6468a25100cf"},{"author":{"_account_id":7233,"name":"Matthew Oliver","email":"matt@oliver.net.au","username":"mattoliverau"},"change_message_id":"3cedf186790de80a9c9f495210c1ed1a1b859160","unresolved":true,"context_lines":[{"line_number":9,"context_line":"We\u0027ve seen this happen a couple of times in production where a root"},{"line_number":10,"context_line":"container suddenly thinks it\u0027s unsharded as it\u0027s own_shard_range is"},{"line_number":11,"context_line":"reset. The only way this can happen is if a new replica or handoff node"},{"line_number":12,"context_line":"places a container and a new own_shard_range is created and replicated"},{"line_number":13,"context_line":"to older sharded primaries."},{"line_number":14,"context_line":""},{"line_number":15,"context_line":"In the sharding life cycle, only never sharded containers can have an"}],"source_content_type":"text/x-gerrit-commit-message","patch_set":2,"id":"1828eb97_430be99b","line":12,"in_reply_to":"2dd70024_7b14f4e1","updated":"2021-09-13 01:04:40.000000000","message":"We\u0027re dealing with own_shard_ranges. When we see these reset roots they\u0027re unsharded because it\u0027s missing the epoch in the own_shard_range, they still have all the shards. Because as you said they always share all shards. \n\nBut after thinking some more, it could also possibly happen in s-m-s-r, maybe if someone tries to run it on a handoff that isn\u0027t actaully sharded. ie does the recon sharding candidates strickly only dump primaries or _anything_ (ie handoffs). \n\nI\u0027ll have a play.","commit_id":"728b75a02e1a47b21b697542f0be6468a25100cf"}],"swift/container/backend.py":[{"author":{"_account_id":1179,"name":"Clay Gerrard","email":"clay.gerrard@gmail.com","username":"clay-gerrard"},"change_message_id":"0bb4eba67c7980331da5160807f38964f2a3bfdf","unresolved":true,"context_lines":[{"line_number":273,"context_line":"    if existing[\u0027epoch\u0027] and shard_data[\u0027epoch\u0027] is None:"},{"line_number":274,"context_line":"        return False"},{"line_number":275,"context_line":"    if existing[\u0027epoch\u0027] is None and shard_data[\u0027epoch\u0027]:"},{"line_number":276,"context_line":"        return True"},{"line_number":277,"context_line":"    if existing[\u0027timestamp\u0027] \u003c shard_data[\u0027timestamp\u0027]:"},{"line_number":278,"context_line":"        # note that currently we do not roll forward any meta or state from"},{"line_number":279,"context_line":"        # an item that was created at older time, newer created time trumps"}],"source_content_type":"text/x-python","patch_set":3,"id":"5f1f2165_84f1cf60","line":276,"updated":"2021-09-17 18:18:44.000000000","message":"this all sounds reasonable; but it\u0027s not obviously a problem\n\nIf the issue is own_shard_range *only* why are we way down here in merge_shards which effects *everything*?\n\nIs there a more obvious/direct/targeted way to prevent this for just own_shard_range?  Is it ever reasonable to allow any shard_range (not just osr) to merge without an epoch (obviously we have lots of tests doing exactly that; but we can fix the tests)","commit_id":"a4ce09e719fce4980cd23be898ad92e4de7673e3"},{"author":{"_account_id":15343,"name":"Tim Burke","email":"tburke@nvidia.com","username":"tburke"},"change_message_id":"10f47439d028ce8ee8e557ac8b06464f6350db34","unresolved":true,"context_lines":[{"line_number":273,"context_line":"    if existing[\u0027epoch\u0027] and shard_data[\u0027epoch\u0027] is None:"},{"line_number":274,"context_line":"        return False"},{"line_number":275,"context_line":"    if existing[\u0027epoch\u0027] is None and shard_data[\u0027epoch\u0027]:"},{"line_number":276,"context_line":"        return True"},{"line_number":277,"context_line":"    if existing[\u0027timestamp\u0027] \u003c shard_data[\u0027timestamp\u0027]:"},{"line_number":278,"context_line":"        # note that currently we do not roll forward any meta or state from"},{"line_number":279,"context_line":"        # an item that was created at older time, newer created time trumps"}],"source_content_type":"text/x-python","patch_set":4,"id":"e431a2e2_1bc959cb","line":276,"updated":"2021-09-20 16:35:22.000000000","message":"I\u0027ve got this nagging feeling like we should also have a\n\n if existing[\u0027epoch\u0027] and shard_data[\u0027epoch\u0027] and \\\n         existing[\u0027epoch\u0027] !\u003d shard_data[\u0027epoch\u0027]:\n     return shard_data[\u0027epoch\u0027] \u003e existing[\u0027epoch\u0027]","commit_id":"2985b511930197f2e00b1360e1af3f219b11ad24"},{"author":{"_account_id":15343,"name":"Tim Burke","email":"tburke@nvidia.com","username":"tburke"},"change_message_id":"10f47439d028ce8ee8e557ac8b06464f6350db34","unresolved":true,"context_lines":[{"line_number":277,"context_line":"    if existing[\u0027timestamp\u0027] \u003c shard_data[\u0027timestamp\u0027]:"},{"line_number":278,"context_line":"        # note that currently we do not roll forward any meta or state from"},{"line_number":279,"context_line":"        # an item that was created at older time, newer created time trumps"},{"line_number":280,"context_line":"        shard_data[\u0027reported\u0027] \u003d 0  # reset the latch"},{"line_number":281,"context_line":"        return True"},{"line_number":282,"context_line":"    elif existing[\u0027timestamp\u0027] \u003e shard_data[\u0027timestamp\u0027]:"},{"line_number":283,"context_line":"        return False"}],"source_content_type":"text/x-python","patch_set":4,"id":"c485b0d5_2ecb851b","line":280,"updated":"2021-09-20 16:35:22.000000000","message":"I\u0027m trying to remember why I felt the need to add this here, and whether it makes sense for us to reset it with any of the epoch differences, too.","commit_id":"2985b511930197f2e00b1360e1af3f219b11ad24"},{"author":{"_account_id":7233,"name":"Matthew Oliver","email":"matt@oliver.net.au","username":"mattoliverau"},"change_message_id":"e83653b48a29828566cc0ad19ad917bf10fc0f67","unresolved":true,"context_lines":[{"line_number":277,"context_line":"    if existing[\u0027timestamp\u0027] \u003c shard_data[\u0027timestamp\u0027]:"},{"line_number":278,"context_line":"        # note that currently we do not roll forward any meta or state from"},{"line_number":279,"context_line":"        # an item that was created at older time, newer created time trumps"},{"line_number":280,"context_line":"        shard_data[\u0027reported\u0027] \u003d 0  # reset the latch"},{"line_number":281,"context_line":"        return True"},{"line_number":282,"context_line":"    elif existing[\u0027timestamp\u0027] \u003e shard_data[\u0027timestamp\u0027]:"},{"line_number":283,"context_line":"        return False"}],"source_content_type":"text/x-python","patch_set":4,"id":"7ae8bb7e_0d3c9ea5","line":280,"in_reply_to":"c485b0d5_2ecb851b","updated":"2021-09-20 23:01:26.000000000","message":"yeah, it\u0027s a good quetion. I guess if shard data we\u0027d return the new data, but do we need to update the root to report data on that or cuold we assume that the other node we recieved it from already did that. I guess we do it here for timestamp so doesn\u0027t hurt to do it for epoch too.. worst case we report stats again.","commit_id":"2985b511930197f2e00b1360e1af3f219b11ad24"}],"test/probe/test_sharder.py":[{"author":{"_account_id":1179,"name":"Clay Gerrard","email":"clay.gerrard@gmail.com","username":"clay-gerrard"},"change_message_id":"ddc662faa6d3004c72d39004b47d5397943ff674","unresolved":true,"context_lines":[{"line_number":3418,"context_line":"        reset_osr \u003d new_primary_broker.get_own_shard_range()"},{"line_number":3419,"context_line":"        self.assertIsNone(reset_osr.epoch)"},{"line_number":3420,"context_line":"        self.assertEqual(reset_osr.state, ShardRange.ACTIVE)"},{"line_number":3421,"context_line":"        new_primary_broker.merge_shard_ranges(reset_osr)"},{"line_number":3422,"context_line":""},{"line_number":3423,"context_line":"        # now let\u0027s replicate with the old primaries"},{"line_number":3424,"context_line":"        self.replicators.once()"}],"source_content_type":"text/x-python","patch_set":3,"id":"74ff6bc3_396a1e31","line":3421,"updated":"2021-09-16 21:20:41.000000000","message":"maybe we could break this test up so we can see each scenario indepdently - obviously we want to keep the other stuff working; but i\u0027m most interested in just the *difference* of how swift behaves with *this* change and without","commit_id":"a4ce09e719fce4980cd23be898ad92e4de7673e3"},{"author":{"_account_id":7233,"name":"Matthew Oliver","email":"matt@oliver.net.au","username":"mattoliverau"},"change_message_id":"f6819eb2025d246cafc057e484ee5f1bc7ec9161","unresolved":true,"context_lines":[{"line_number":3418,"context_line":"        reset_osr \u003d new_primary_broker.get_own_shard_range()"},{"line_number":3419,"context_line":"        self.assertIsNone(reset_osr.epoch)"},{"line_number":3420,"context_line":"        self.assertEqual(reset_osr.state, ShardRange.ACTIVE)"},{"line_number":3421,"context_line":"        new_primary_broker.merge_shard_ranges(reset_osr)"},{"line_number":3422,"context_line":""},{"line_number":3423,"context_line":"        # now let\u0027s replicate with the old primaries"},{"line_number":3424,"context_line":"        self.replicators.once()"}],"source_content_type":"text/x-python","patch_set":3,"id":"3d5210a8_298c5726","line":3421,"in_reply_to":"74ff6bc3_396a1e31","updated":"2021-09-17 04:02:40.000000000","message":"Yup good call 😊","commit_id":"a4ce09e719fce4980cd23be898ad92e4de7673e3"}],"test/unit/container/test_backend.py":[{"author":{"_account_id":1179,"name":"Clay Gerrard","email":"clay.gerrard@gmail.com","username":"clay-gerrard"},"change_message_id":"0bb4eba67c7980331da5160807f38964f2a3bfdf","unresolved":true,"context_lines":[{"line_number":4989,"context_line":"            self.assertIsNone(own_sr.epoch)"},{"line_number":4990,"context_line":"            broker.merge_shard_ranges(own_sr)"},{"line_number":4991,"context_line":"            self.assertEqual(dict(own_sr),"},{"line_number":4992,"context_line":"                             dict(broker.get_own_shard_range(no_default\u003dTrue)))"},{"line_number":4993,"context_line":""},{"line_number":4994,"context_line":"            # Update own shard range"},{"line_number":4995,"context_line":"            own_sr.update_state(ShardRange.SHARDED, ts[0])"}],"source_content_type":"text/x-python","patch_set":3,"id":"4fb73d0d_e1d01d10","line":4992,"updated":"2021-09-17 18:18:44.000000000","message":"why would we have a non-default osr without an epoch!?  Is this reasonable?  Isn\u0027t this the problem?","commit_id":"a4ce09e719fce4980cd23be898ad92e4de7673e3"},{"author":{"_account_id":7233,"name":"Matthew Oliver","email":"matt@oliver.net.au","username":"mattoliverau"},"change_message_id":"046b89d54f5159683f7896d52c523063b306ba4b","unresolved":true,"context_lines":[{"line_number":4989,"context_line":"            self.assertIsNone(own_sr.epoch)"},{"line_number":4990,"context_line":"            broker.merge_shard_ranges(own_sr)"},{"line_number":4991,"context_line":"            self.assertEqual(dict(own_sr),"},{"line_number":4992,"context_line":"                             dict(broker.get_own_shard_range(no_default\u003dTrue)))"},{"line_number":4993,"context_line":""},{"line_number":4994,"context_line":"            # Update own shard range"},{"line_number":4995,"context_line":"            own_sr.update_state(ShardRange.SHARDED, ts[0])"}],"source_content_type":"text/x-python","patch_set":3,"id":"8a716d4e_434d51b0","line":4992,"in_reply_to":"4fb73d0d_e1d01d10","updated":"2021-09-19 23:53:41.000000000","message":"no_deafult\u003dTrue, just means to either return an existing OSR or a None, don\u0027t always return an osr (being exisiting or a new one (default)).\n\nSo the own SR we put in has no epoch, we\u0027re pulling it out again and making sure it\u0027s still the same.\n\nAs the to question, why would we have a non-default without an epoch. An epoch is added when it\u0027s time to shard. So shards will have an osr without an epoch. But when a shard shards an epoch will be added. This allows us to know when it was marked as sharding. If the epch doesn\u0027t match the db epoch (in the filename) then it\u0027s also considered unsharded. (https://github.com/openstack/swift/blob/master/swift/container/backend.py#L408)\nSo we can\u0027t just always set an epoch either 😞","commit_id":"a4ce09e719fce4980cd23be898ad92e4de7673e3"}],"test/unit/container/test_sharder.py":[{"author":{"_account_id":1179,"name":"Clay Gerrard","email":"clay.gerrard@gmail.com","username":"clay-gerrard"},"change_message_id":"0bb4eba67c7980331da5160807f38964f2a3bfdf","unresolved":true,"context_lines":[{"line_number":4807,"context_line":"            own_sr.epoch \u003d epoch"},{"line_number":4808,"context_line":"            with mock.patch(\u0027swift.container.backend.merge_shards\u0027,"},{"line_number":4809,"context_line":"                            return_value\u003dTrue):"},{"line_number":4810,"context_line":"                broker.merge_shard_ranges([own_sr])"},{"line_number":4811,"context_line":"            with self._mock_sharder() as sharder:"},{"line_number":4812,"context_line":"                with mock_timestamp_now() as now:"},{"line_number":4813,"context_line":"                    sharder._process_broker(broker, node, 99)"}],"source_content_type":"text/x-python","patch_set":3,"id":"8dc727ee_a043e0a7","line":4810,"updated":"2021-09-17 18:18:44.000000000","message":"this was a huge smell to me - do we want to allow an osr w/o and epoch to merge ever?!","commit_id":"a4ce09e719fce4980cd23be898ad92e4de7673e3"},{"author":{"_account_id":7233,"name":"Matthew Oliver","email":"matt@oliver.net.au","username":"mattoliverau"},"change_message_id":"046b89d54f5159683f7896d52c523063b306ba4b","unresolved":true,"context_lines":[{"line_number":4807,"context_line":"            own_sr.epoch \u003d epoch"},{"line_number":4808,"context_line":"            with mock.patch(\u0027swift.container.backend.merge_shards\u0027,"},{"line_number":4809,"context_line":"                            return_value\u003dTrue):"},{"line_number":4810,"context_line":"                broker.merge_shard_ranges([own_sr])"},{"line_number":4811,"context_line":"            with self._mock_sharder() as sharder:"},{"line_number":4812,"context_line":"                with mock_timestamp_now() as now:"},{"line_number":4813,"context_line":"                    sharder._process_broker(broker, node, 99)"}],"source_content_type":"text/x-python","patch_set":3,"id":"04b78644_0943cdc3","line":4810,"in_reply_to":"8dc727ee_a043e0a7","updated":"2021-09-19 23:53:41.000000000","message":"yeah, most shard osr\u0027s wont have an epoch unless they\u0027re sharding, then they\u0027ll shard and delete themselves.\n\nI know.. the whole epoch thing can get a little confusing.","commit_id":"a4ce09e719fce4980cd23be898ad92e4de7673e3"}]}
