)]}' {"/PATCHSET_LEVEL":[{"author":{"_account_id":1179,"name":"Clay Gerrard","email":"clay.gerrard@gmail.com","username":"clay-gerrard"},"change_message_id":"067ccf7b0c067d3b007cd7552d68519247039b7f","unresolved":false,"context_lines":[],"source_content_type":"","patch_set":2,"id":"5fca6e24_1e17d768","updated":"2021-12-01 13:55:41.000000000","message":"If we\u0027re going to add error limiting to the updater I think it\u0027d be worth trying pretty hard to standardize the interface with the existing implemenation of error limiting in the proxy:\n\nhttps://github.com/openstack/swift/blob/master/swift/proxy/server.py#L685\n\nit\u0027s seems to me the goals should be pretty similar; but right now this seems to be trying to do something quite different\n\nalso rather than any specific device; i wonder if our problem is a bottleneck with a specific container (maybe the root? or a shard?)","commit_id":"e64b94f2ba86222c473bbbed3d5d8a155cb5f781"},{"author":{"_account_id":7233,"name":"Matthew Oliver","email":"matt@oliver.net.au","username":"mattoliverau"},"change_message_id":"1384358427d84b4b47a301412a102009b8ef663e","unresolved":false,"context_lines":[],"source_content_type":"","patch_set":3,"id":"db057af2_9069135f","updated":"2021-12-06 22:23:15.000000000","message":"Here\u0027s an updater that uses the proxies error limiting. Also has the option of using memcache for global look up: https://review.opendev.org/c/openstack/swift/+/820486\n\nIn the process of changing the refactored error limiting from a mixin to a class, so fixing that up now.","commit_id":"62fe75a16212e356f35715ecc4b97d8ffcd21dd5"}],"swift/obj/updater.py":[{"author":{"_account_id":15343,"name":"Tim Burke","email":"tburke@nvidia.com","username":"tburke"},"change_message_id":"738a22606cc21b1c5ee22419f4b3b02c643c552e","unresolved":true,"context_lines":[{"line_number":390,"context_line":""},{"line_number":391,"context_line":" next_update_time, _, limiting_node \u003d max("},{"line_number":392,"context_line":" (self.next_update[node[\u0027id\u0027]], i, node)"},{"line_number":393,"context_line":" for i, node in enumerate(nodes_to_update))"},{"line_number":394,"context_line":" if time.time() \u003c next_update_time:"},{"line_number":395,"context_line":" self.stats.failures +\u003d 1"},{"line_number":396,"context_line":" self.logger.increment(\u0027skips\u0027)"}],"source_content_type":"text/x-python","patch_set":1,"id":"4a23b729_7c79f2e5","line":393,"updated":"2021-12-01 01:16:29.000000000","message":"I could see an argument that we should still try updating the nodes that aren\u0027t overloaded, instead of skipping whenever *any* server is overloaded. This was expedient, though; the logic is going to get more convoluted when trying to decide whether to increment successes vs skips.","commit_id":"733c6bd0524181e2b97cf35061375a228ebb05e3"},{"author":{"_account_id":15343,"name":"Tim Burke","email":"tburke@nvidia.com","username":"tburke"},"change_message_id":"6a2ce0bae228ed9ae6d71fd5e8537e01779988df","unresolved":true,"context_lines":[{"line_number":476,"context_line":" was skipped, or False if the update failed;"},{"line_number":477,"context_line":" ``node_id`` is the id of the node updated; and"},{"line_number":478,"context_line":" ``redirect`` is either None or a tuple of"},{"line_number":479,"context_line":" (a path, a timestamp string)."},{"line_number":480,"context_line":" \"\"\""},{"line_number":481,"context_line":""},{"line_number":482,"context_line":" if time.time() \u003c self.next_update[node[\u0027id\u0027]]:"}],"source_content_type":"text/x-python","patch_set":2,"id":"938131c6_30deb9cf","line":479,"updated":"2021-12-01 05:04:27.000000000","message":"Apparently the docs build doesn\u0027t like these indents.","commit_id":"e64b94f2ba86222c473bbbed3d5d8a155cb5f781"},{"author":{"_account_id":15343,"name":"Tim Burke","email":"tburke@nvidia.com","username":"tburke"},"change_message_id":"6a2ce0bae228ed9ae6d71fd5e8537e01779988df","unresolved":true,"context_lines":[{"line_number":479,"context_line":" (a path, a timestamp string)."},{"line_number":480,"context_line":" \"\"\""},{"line_number":481,"context_line":""},{"line_number":482,"context_line":" if time.time() \u003c self.next_update[node[\u0027id\u0027]]:"},{"line_number":483,"context_line":" self.logger.debug("},{"line_number":484,"context_line":" \u0027Update skipped for %(obj)s \u0027"},{"line_number":485,"context_line":" \u0027(waiting on %(ip)s:%(port)s/%(device)s)\u0027,"}],"source_content_type":"text/x-python","patch_set":2,"id":"ebe91076_27c5703d","line":482,"range":{"start_line":482,"start_character":42,"end_line":482,"end_character":52},"updated":"2021-12-01 05:04:27.000000000","message":"I wonder a little if this should be keyed off node[\u0027ip\u0027] instead -- idk that you even *can* run container-servers with servers_per_port, so I think one overloaded/bad disk can spoil the bunch...","commit_id":"e64b94f2ba86222c473bbbed3d5d8a155cb5f781"},{"author":{"_account_id":7233,"name":"Matthew Oliver","email":"matt@oliver.net.au","username":"mattoliverau"},"change_message_id":"6be3e27f5564fd2e9a48af78178b8ef42d78a59e","unresolved":true,"context_lines":[{"line_number":479,"context_line":" (a path, a timestamp string)."},{"line_number":480,"context_line":" \"\"\""},{"line_number":481,"context_line":""},{"line_number":482,"context_line":" if time.time() \u003c self.next_update[node[\u0027id\u0027]]:"},{"line_number":483,"context_line":" self.logger.debug("},{"line_number":484,"context_line":" \u0027Update skipped for %(obj)s \u0027"},{"line_number":485,"context_line":" \u0027(waiting on %(ip)s:%(port)s/%(device)s)\u0027,"}],"source_content_type":"text/x-python","patch_set":2,"id":"bbe00ed2_954830a0","line":482,"range":{"start_line":482,"start_character":42,"end_line":482,"end_character":52},"in_reply_to":"692899e2_7f729846","updated":"2021-12-02 05:45:48.000000000","message":"Something a little like: https://paste.opendev.org/show/811386/\n\nThe updater can just inherit the mixin and it\u0027ll get all the peices of error limitted nodes (like the proxy).\n\nBut that diff still needs a way of getting the memcache client to it. Either passing in the req/env or maybe just getting the mixin to look for /etc/swift/memcache.conf.\n\nI guess the downside to this is if a rogue bad proxy had issues it would start error limiting good nodes.. but in order to do that it\u0027ll need to be able to speak to the memcache ring.","commit_id":"e64b94f2ba86222c473bbbed3d5d8a155cb5f781"},{"author":{"_account_id":1179,"name":"Clay Gerrard","email":"clay.gerrard@gmail.com","username":"clay-gerrard"},"change_message_id":"067ccf7b0c067d3b007cd7552d68519247039b7f","unresolved":true,"context_lines":[{"line_number":479,"context_line":" (a path, a timestamp string)."},{"line_number":480,"context_line":" \"\"\""},{"line_number":481,"context_line":""},{"line_number":482,"context_line":" if time.time() \u003c self.next_update[node[\u0027id\u0027]]:"},{"line_number":483,"context_line":" self.logger.debug("},{"line_number":484,"context_line":" \u0027Update skipped for %(obj)s \u0027"},{"line_number":485,"context_line":" \u0027(waiting on %(ip)s:%(port)s/%(device)s)\u0027,"}],"source_content_type":"text/x-python","patch_set":2,"id":"fbca24aa_5690d78f","line":482,"range":{"start_line":482,"start_character":42,"end_line":482,"end_character":52},"in_reply_to":"ebe91076_27c5703d","updated":"2021-12-01 13:55:41.000000000","message":"I think you could ask the same questions about the proxy:\n\nhttps://github.com/openstack/swift/blob/master/swift/proxy/server.py#L635\n\n... we may want to use different keys for object vs container servers","commit_id":"e64b94f2ba86222c473bbbed3d5d8a155cb5f781"},{"author":{"_account_id":7233,"name":"Matthew Oliver","email":"matt@oliver.net.au","username":"mattoliverau"},"change_message_id":"40d6d17cdadc5f285d79a6111d9116596edbcc70","unresolved":true,"context_lines":[{"line_number":479,"context_line":" (a path, a timestamp string)."},{"line_number":480,"context_line":" \"\"\""},{"line_number":481,"context_line":""},{"line_number":482,"context_line":" if time.time() \u003c self.next_update[node[\u0027id\u0027]]:"},{"line_number":483,"context_line":" self.logger.debug("},{"line_number":484,"context_line":" \u0027Update skipped for %(obj)s \u0027"},{"line_number":485,"context_line":" \u0027(waiting on %(ip)s:%(port)s/%(device)s)\u0027,"}],"source_content_type":"text/x-python","patch_set":2,"id":"692899e2_7f729846","line":482,"range":{"start_line":482,"start_character":42,"end_line":482,"end_character":52},"in_reply_to":"fbca24aa_5690d78f","updated":"2021-12-02 03:22:26.000000000","message":"Error limitted nodes seem to keyed to a device. Could we pull the error limitting out of the proxy and move somewhere like utils. Seeing as the \"keys\" themselves are tied to a node/device, what if we push these up into memcache rather just in memory, then the whole cluster can learn as a whole. Or rather, only push to memcache once we hit a error_limited limit. the could can still happen in the proxies memory.\n\nImagine if a proxy knew a device was being bad, so threw it into memcache and all other proxies and updaters would know to hold off for a while.","commit_id":"e64b94f2ba86222c473bbbed3d5d8a155cb5f781"},{"author":{"_account_id":1179,"name":"Clay Gerrard","email":"clay.gerrard@gmail.com","username":"clay-gerrard"},"change_message_id":"067ccf7b0c067d3b007cd7552d68519247039b7f","unresolved":true,"context_lines":[{"line_number":519,"context_line":" except (Exception, Timeout):"},{"line_number":520,"context_line":" self.logger.exception(_(\u0027ERROR with remote server \u0027"},{"line_number":521,"context_line":" \u0027%(ip)s:%(port)s/%(device)s\u0027), node)"},{"line_number":522,"context_line":" status \u003d 499"},{"line_number":523,"context_line":" finally:"},{"line_number":524,"context_line":" elapsed \u003d time.time() - start"},{"line_number":525,"context_line":" self.logger.timing(\u0027updater.timing.status.%s\u0027 % status,"}],"source_content_type":"text/x-python","patch_set":2,"id":"57a53e8e_b302f2fd","line":522,"updated":"2021-12-01 13:55:41.000000000","message":"it seems to me if we\u0027re going to add error limiting - this is the place we\u0027d do it","commit_id":"e64b94f2ba86222c473bbbed3d5d8a155cb5f781"},{"author":{"_account_id":15343,"name":"Tim Burke","email":"tburke@nvidia.com","username":"tburke"},"change_message_id":"6a2ce0bae228ed9ae6d71fd5e8537e01779988df","unresolved":true,"context_lines":[{"line_number":527,"context_line":" self.logger.timing(node_timing_string, elapsed * 1000)"},{"line_number":528,"context_line":" if self.max_objects_per_second:"},{"line_number":529,"context_line":" expected_time_per_request \u003d (float(self.concurrency) /"},{"line_number":530,"context_line":" self.max_objects_per_second)"},{"line_number":531,"context_line":" self.next_update[node[\u0027id\u0027]] \u003d max("},{"line_number":532,"context_line":" self.next_update[node[\u0027id\u0027]],"},{"line_number":533,"context_line":" time.time() + max(elapsed - expected_time_per_request, 0))"}],"source_content_type":"text/x-python","patch_set":2,"id":"62dbd556_11adf7a6","line":530,"updated":"2021-12-01 05:04:27.000000000","message":"This is a very hand-wave-y calculation -- may need some constant factor applied (1.5x? 2x?), or just make it its own tunable.","commit_id":"e64b94f2ba86222c473bbbed3d5d8a155cb5f781"},{"author":{"_account_id":1179,"name":"Clay Gerrard","email":"clay.gerrard@gmail.com","username":"clay-gerrard"},"change_message_id":"067ccf7b0c067d3b007cd7552d68519247039b7f","unresolved":true,"context_lines":[{"line_number":527,"context_line":" self.logger.timing(node_timing_string, elapsed * 1000)"},{"line_number":528,"context_line":" if self.max_objects_per_second:"},{"line_number":529,"context_line":" expected_time_per_request \u003d (float(self.concurrency) /"},{"line_number":530,"context_line":" self.max_objects_per_second)"},{"line_number":531,"context_line":" self.next_update[node[\u0027id\u0027]] \u003d max("},{"line_number":532,"context_line":" self.next_update[node[\u0027id\u0027]],"},{"line_number":533,"context_line":" time.time() + max(elapsed - expected_time_per_request, 0))"}],"source_content_type":"text/x-python","patch_set":2,"id":"5680b64b_059a178f","line":530,"in_reply_to":"62dbd556_11adf7a6","updated":"2021-12-01 13:55:41.000000000","message":"I don\u0027t see why they\u0027d be related at all","commit_id":"e64b94f2ba86222c473bbbed3d5d8a155cb5f781"},{"author":{"_account_id":15343,"name":"Tim Burke","email":"tburke@nvidia.com","username":"tburke"},"change_message_id":"6a2ce0bae228ed9ae6d71fd5e8537e01779988df","unresolved":true,"context_lines":[{"line_number":529,"context_line":" expected_time_per_request \u003d (float(self.concurrency) /"},{"line_number":530,"context_line":" self.max_objects_per_second)"},{"line_number":531,"context_line":" self.next_update[node[\u0027id\u0027]] \u003d max("},{"line_number":532,"context_line":" self.next_update[node[\u0027id\u0027]],"},{"line_number":533,"context_line":" time.time() + max(elapsed - expected_time_per_request, 0))"},{"line_number":534,"context_line":" return False, node[\u0027id\u0027], redirect"}],"source_content_type":"text/x-python","patch_set":2,"id":"3485ecb0_96831ffa","line":532,"updated":"2021-12-01 05:04:27.000000000","message":"Just because we got *one* snappy response doesn\u0027t make everything OK again.","commit_id":"e64b94f2ba86222c473bbbed3d5d8a155cb5f781"},{"author":{"_account_id":15343,"name":"Tim Burke","email":"tburke@nvidia.com","username":"tburke"},"change_message_id":"6a2ce0bae228ed9ae6d71fd5e8537e01779988df","unresolved":true,"context_lines":[{"line_number":530,"context_line":" self.max_objects_per_second)"},{"line_number":531,"context_line":" self.next_update[node[\u0027id\u0027]] \u003d max("},{"line_number":532,"context_line":" self.next_update[node[\u0027id\u0027]],"},{"line_number":533,"context_line":" time.time() + max(elapsed - expected_time_per_request, 0))"},{"line_number":534,"context_line":" return False, node[\u0027id\u0027], redirect"}],"source_content_type":"text/x-python","patch_set":2,"id":"b875b294_67c6ee9a","line":533,"updated":"2021-12-01 05:04:27.000000000","message":"If you squint a little, this looks a lot like our other rate limiting stuff.","commit_id":"e64b94f2ba86222c473bbbed3d5d8a155cb5f781"}]}