)]}'
{"/PATCHSET_LEVEL":[{"author":{"_account_id":1179,"name":"Clay Gerrard","email":"clay.gerrard@gmail.com","username":"clay-gerrard"},"change_message_id":"7fecf8326f6429e9ce000202e24e422e457124c1","unresolved":false,"context_lines":[],"source_content_type":"","patch_set":1,"id":"8ff03c83_f929b564","updated":"2025-07-09 18:01:50.000000000","message":"FWIW I\u0027m most interesting in getting timing from the reconstructor.\n\nI\u0027m trying to keep the number of debug messages to a minimum, and also I had some deadlocks when logging from w/i a threadpool.  So all the timing for a single trace.log comes out in one line:\n\n```\nobject-reconstructor-6010: reconstructor._get_hashes took 5.08 ms#012_get_hashes.consolidate_hashes took 2.31 ms#012_get_hashes.do_listdir took 0.01 ms#012_get_hashes.rehash.hash_8cc.list_suffix_dir took 0.02 ms#012_get_hashes.rehash.hash_8cc num_hsh_paths: 1#012_get_hashes.rehash.hash_8cc.cleanup_ondisk_files took 0.06 ms#012_get_hashes.rehash.hash_8cc.update_suffix_hashes took 0.02 ms#012_get_hashes.rehash.hash_8cc.rmdir took 0.01 ms#012_get_hashes.rehash.hash_8cc took 0.15 ms#012_get_hashes.rehash took 0.16 ms#012_get_hashes hashed: 1#012_get_hashes.lock took 0.04 ms#012_get_hashes.reread_hashes took 0.03 ms#012_get_hashes.write_hashes took 1.12 ms#012_get_hashes took 3.92 ms\n```\n\nwhich makes them easy to parse:\n\n```\nobject-reconstructor-6010: reconstructor._get_hashes took 5.08 ms\n_get_hashes.consolidate_hashes took 2.31 ms\n_get_hashes.do_listdir took 0.01 ms\n_get_hashes.rehash.hash_8cc.list_suffix_dir took 0.02 ms\n_get_hashes.rehash.hash_8cc num_hsh_paths: 1\n_get_hashes.rehash.hash_8cc.cleanup_ondisk_files took 0.06 ms\n_get_hashes.rehash.hash_8cc.update_suffix_hashes took 0.02 ms\n_get_hashes.rehash.hash_8cc.rmdir took 0.01 ms\n_get_hashes.rehash.hash_8cc took 0.15 ms\n_get_hashes.rehash took 0.16 ms\n_get_hashes hashed: 1\n_get_hashes.lock took 0.04 ms\n_get_hashes.reread_hashes took 0.03 ms\n_get_hashes.write_hashes took 1.12 ms\n_get_hashes took 3.92 ms\n```\n\nI think the nested/accumulation trace/span pattern is perfect for the kind of analysis we need to do in prod - I\u0027m happy to carry this and run a single foreground reconstructor on a single partition with debug logging in prod.  Once we have our problem debugged/fixed we can go back and consider what sort of trace primitives/patterns we need to polish for merging to master.","commit_id":"521481f2c9d678cd91650e143cf4f21c919daaed"},{"author":{"_account_id":34930,"name":"Jianjian Huo","email":"jhuo@nvidia.com","username":"jhuo"},"change_message_id":"3c2524da4693b818c4d21d3b8839e3495834d213","unresolved":false,"context_lines":[],"source_content_type":"","patch_set":1,"id":"36515f63_e735c587","updated":"2025-07-09 18:51:22.000000000","message":"Other than the concern that open/close structures are invasive and very distracting, we also need to think ahead how we are going to consolidate the new trace class added in this patch with the coming otel traces, they serve very similar purpose, whereas the slow request logging decorator is just supplemental to otel.","commit_id":"521481f2c9d678cd91650e143cf4f21c919daaed"},{"author":{"_account_id":1179,"name":"Clay Gerrard","email":"clay.gerrard@gmail.com","username":"clay-gerrard"},"change_message_id":"b5cb5775dc9d29ed337894e39f55b37e33790d51","unresolved":false,"context_lines":[],"source_content_type":"","patch_set":1,"id":"01003135_740a8199","updated":"2025-07-09 19:23:35.000000000","message":"very heavily inspired by otel for sure!  To be clear I don\u0027t want to merge this - I just need to debug prod - the simplest thing to review/carry and throw away is all I\u0027m going for.","commit_id":"521481f2c9d678cd91650e143cf4f21c919daaed"},{"author":{"_account_id":7233,"name":"Matthew Oliver","email":"matt@oliver.net.au","username":"mattoliverau"},"change_message_id":"ec1636d3cfac675a7622dd08488d7bb0d3d163ea","unresolved":false,"context_lines":[],"source_content_type":"","patch_set":2,"id":"b1403291_c404c879","updated":"2025-07-14 01:46:40.000000000","message":"Cool, I might have a look at how this could also be applied in the tracing branch, When we could turn on tracing for REPLICATE requests if we ever need to check things once it finally lands.\n\nHowever, the current implementation of the request tracing is storing the tracer and trace context in the request env not the logger, because it\u0027s per request. So in my tracing chain, get_hashes would have to grow the ability to pass in a request or request env to get the right tracing context. :hmm:\n\nBUT, I thought the point of this, at least if we wanted to carry it so you could get some data for your debugging journey in prod, is the abiltiy to turn this on or off.. this seems likes it\u0027s always ON","commit_id":"2f1dbb0e5aaa2cfbebc88656aa06fa326b73c24f"},{"author":{"_account_id":7233,"name":"Matthew Oliver","email":"matt@oliver.net.au","username":"mattoliverau"},"change_message_id":"fc722a771976560af283c10f65d56b299605aacc","unresolved":false,"context_lines":[],"source_content_type":"","patch_set":2,"id":"847b5a81_3bf2acac","updated":"2025-07-14 07:28:32.000000000","message":"Here is a very rough scetch out of how it might look in otel via my chain: https://review.opendev.org/c/openstack/swift/+/954900","commit_id":"2f1dbb0e5aaa2cfbebc88656aa06fa326b73c24f"},{"author":{"_account_id":1179,"name":"Clay Gerrard","email":"clay.gerrard@gmail.com","username":"clay-gerrard"},"change_message_id":"f6b32b8a23178813993a924c71380c56ec77dd51","unresolved":false,"context_lines":[],"source_content_type":"","patch_set":2,"id":"fb8bc1e6_7f86341f","updated":"2025-07-14 16:48:07.000000000","message":"I\u0027m going to add tracing to cleanup_ondisk_files too - it does a listdir and unlinks","commit_id":"2f1dbb0e5aaa2cfbebc88656aa06fa326b73c24f"},{"author":{"_account_id":1179,"name":"Clay Gerrard","email":"clay.gerrard@gmail.com","username":"clay-gerrard"},"change_message_id":"f6b32b8a23178813993a924c71380c56ec77dd51","unresolved":false,"context_lines":[],"source_content_type":"","patch_set":2,"id":"9b282197_59a1d338","in_reply_to":"b1403291_c404c879","updated":"2025-07-14 16:48:07.000000000","message":"\u003e in the request env not the logger, because it\u0027s per request\nthe thread local in this patch on the logger is *also* \"per request\" ... or at least per thread.  I\u0027d like not having to update every function signature we might ever want to trace, but when we use a threadpool (or spawn a new greenthread) those do need to pass ctx through if we use a threadlocal\n\n\u003e context in the request env not the logger\n\nMaybe if we have to plumb it everywhere we can call the \"trace\" (or \"env\") kwarg a \"ctx\" so that it\u0027s more generic and future proof to new plumbing...\n\n\u003e this seems likes it\u0027s always ON\n\nit\u0027s never \"on\" - it only ever outputs trace info to logging when you turn on DEBUG (which we\u0027d never do except in a single partition limited foreground reconstructor)","commit_id":"2f1dbb0e5aaa2cfbebc88656aa06fa326b73c24f"},{"author":{"_account_id":1179,"name":"Clay Gerrard","email":"clay.gerrard@gmail.com","username":"clay-gerrard"},"change_message_id":"7f350d1893b2e8cad29fa6cfc77438121a4525ca","unresolved":false,"context_lines":[],"source_content_type":"","patch_set":3,"id":"8054f5ea_911c231c","updated":"2025-07-14 20:27:55.000000000","message":"thanks for the protip on cleanup - it *should* just be the one listdir per call, but if there *is* a cleanup slowing us down it sure would have sucked to think it was the unlinks on obsoletes only to find out there\u0027s a bunch of slow errors trying to delete empty dirs!?","commit_id":"8c1a22aceeedabe07223665a1c1dac1001ac70e4"},{"author":{"_account_id":1179,"name":"Clay Gerrard","email":"clay.gerrard@gmail.com","username":"clay-gerrard"},"change_message_id":"9bae7e30a69fb610d20b25c19f269d72f828cb24","unresolved":false,"context_lines":[],"source_content_type":"","patch_set":6,"id":"487ba08a_f3ee3a87","updated":"2025-07-28 15:48:36.000000000","message":"thanks for thinking this one through Al!","commit_id":"809fc8df1030873788deb2dbc2762be578c65c7b"},{"author":{"_account_id":30073,"name":"Brendan Shephard","email":"bshephar@bne-home.net","username":"bshephar"},"change_message_id":"36915101a999455caf739af6cae2c0fcde67ca86","unresolved":false,"context_lines":[],"source_content_type":"","patch_set":10,"id":"aec748db_360d86c9","updated":"2025-08-07 03:26:03.000000000","message":"That context manager implementation is quite nice, it makes it quite clean.","commit_id":"9689669b47b92d1c3c3f19365276dfa8db6eb967"},{"author":{"_account_id":1179,"name":"Clay Gerrard","email":"clay.gerrard@gmail.com","username":"clay-gerrard"},"change_message_id":"b4ccb2124071c345d152f508f6555af2db33adad","unresolved":false,"context_lines":[],"source_content_type":"","patch_set":10,"id":"253e7619_449009c2","updated":"2025-09-23 22:59:51.000000000","message":"apparently we hit that PathNotDir error a lot during relinker rehash.","commit_id":"9689669b47b92d1c3c3f19365276dfa8db6eb967"},{"author":{"_account_id":1179,"name":"Clay Gerrard","email":"clay.gerrard@gmail.com","username":"clay-gerrard"},"change_message_id":"fd98e04c7b5fb94869568097efd9ba72145fe817","unresolved":false,"context_lines":[],"source_content_type":"","patch_set":10,"id":"9d98b933_d8ee4106","updated":"2025-09-23 17:27:55.000000000","message":"despite our best efforts with context managers this has managed to cause *another* memory leak in the relinker this time; and even with `EVENTLET_THREADPOOL_SIZE\u003d0`!?\n\nI don\u0027t know why it\u0027s so difficult to manage the life-cycle of these objects - but passing them around to instrumented methods as an explicit paramater has got to be better than loosing days to debugging memory leaks.","commit_id":"9689669b47b92d1c3c3f19365276dfa8db6eb967"}],"swift/obj/diskfile.py":[{"author":{"_account_id":34930,"name":"Jianjian Huo","email":"jhuo@nvidia.com","username":"jhuo"},"change_message_id":"3c2524da4693b818c4d21d3b8839e3495834d213","unresolved":true,"context_lines":[{"line_number":736,"context_line":"            or LogTraceContext(name, logger))"},{"line_number":737,"context_line":"        return logger._cls_thread_local.trace"},{"line_number":738,"context_line":""},{"line_number":739,"context_line":"    def open_span(self, span_name, summary_key\u003dNone):"},{"line_number":740,"context_line":"        self.spans.append({"},{"line_number":741,"context_line":"            \u0027name\u0027: span_name,"},{"line_number":742,"context_line":"            \u0027start_time\u0027: time.time(),"}],"source_content_type":"text/x-python","patch_set":1,"id":"4de7dd80_142bd841","line":739,"updated":"2025-07-09 18:51:22.000000000","message":"this is very similar to what otel implements: https://review.opendev.org/c/openstack/swift/+/953977/4/swift/obj/diskfile.py","commit_id":"521481f2c9d678cd91650e143cf4f21c919daaed"},{"author":{"_account_id":1179,"name":"Clay Gerrard","email":"clay.gerrard@gmail.com","username":"clay-gerrard"},"change_message_id":"b5cb5775dc9d29ed337894e39f55b37e33790d51","unresolved":true,"context_lines":[{"line_number":736,"context_line":"            or LogTraceContext(name, logger))"},{"line_number":737,"context_line":"        return logger._cls_thread_local.trace"},{"line_number":738,"context_line":""},{"line_number":739,"context_line":"    def open_span(self, span_name, summary_key\u003dNone):"},{"line_number":740,"context_line":"        self.spans.append({"},{"line_number":741,"context_line":"            \u0027name\u0027: span_name,"},{"line_number":742,"context_line":"            \u0027start_time\u0027: time.time(),"}],"source_content_type":"text/x-python","patch_set":1,"id":"2077f940_c207159a","line":739,"in_reply_to":"4de7dd80_142bd841","updated":"2025-07-09 19:23:35.000000000","message":"yes!  I\u0027d looked at otel before.  If it\u0027s ever accidently looked like I had a good idea it was actually stolen from somewhere.\n\nTho, did you mean to link to a patch or comment that referenced/used otel?\n\nhttps://opentelemetry.io/docs/languages/python/instrumentation/","commit_id":"521481f2c9d678cd91650e143cf4f21c919daaed"},{"author":{"_account_id":34930,"name":"Jianjian Huo","email":"jhuo@nvidia.com","username":"jhuo"},"change_message_id":"de0c59b4e3365aa1e6c3fb62ebfa0b39538cdadf","unresolved":true,"context_lines":[{"line_number":1473,"context_line":"        rehash_span.close_span()"},{"line_number":1474,"context_line":"        trace.message(f\"hashed: {hashed}\")"},{"line_number":1475,"context_line":"        if modified:"},{"line_number":1476,"context_line":"            lock_span \u003d trace.open_span(\u0027lock\u0027)"},{"line_number":1477,"context_line":"            with lock_path(partition_path):"},{"line_number":1478,"context_line":"                lock_span.close_span()"},{"line_number":1479,"context_line":"                reread_span \u003d trace.open_span(\u0027reread_hashes\u0027)"}],"source_content_type":"text/x-python","patch_set":1,"id":"d4252814_2bba99f8","line":1476,"updated":"2025-07-09 06:07:53.000000000","message":"IMHO, I feel the usages of those trace calls are kind of invasive and they break up the “flow” of the function for readers. The function logic is now interleaved with instrumentation which looks distracting, and I find it harder to follow the actual control flow with them.","commit_id":"521481f2c9d678cd91650e143cf4f21c919daaed"},{"author":{"_account_id":34930,"name":"Jianjian Huo","email":"jhuo@nvidia.com","username":"jhuo"},"change_message_id":"3c2524da4693b818c4d21d3b8839e3495834d213","unresolved":true,"context_lines":[{"line_number":1473,"context_line":"        rehash_span.close_span()"},{"line_number":1474,"context_line":"        trace.message(f\"hashed: {hashed}\")"},{"line_number":1475,"context_line":"        if modified:"},{"line_number":1476,"context_line":"            lock_span \u003d trace.open_span(\u0027lock\u0027)"},{"line_number":1477,"context_line":"            with lock_path(partition_path):"},{"line_number":1478,"context_line":"                lock_span.close_span()"},{"line_number":1479,"context_line":"                reread_span \u003d trace.open_span(\u0027reread_hashes\u0027)"}],"source_content_type":"text/x-python","patch_set":1,"id":"88938243_f099b394","line":1476,"in_reply_to":"2362ca36_4112411d","updated":"2025-07-09 18:51:22.000000000","message":"\u003e\u003e 1) only do timing per-method using decorators (i.e. refactor code before you can instrument it)\n\nOne of the important goals when I implemented the breakdown timing decorator is try to keep the original function flow intact, as the example shown in the diskfile.py changes: https://review.opendev.org/c/openstack/swift/+/953977/4/swift/obj/diskfile.py\n\nI can try adding the breakdown timing decorator on the reconstructor, let\u0027s see what does it look.","commit_id":"521481f2c9d678cd91650e143cf4f21c919daaed"},{"author":{"_account_id":1179,"name":"Clay Gerrard","email":"clay.gerrard@gmail.com","username":"clay-gerrard"},"change_message_id":"b5cb5775dc9d29ed337894e39f55b37e33790d51","unresolved":true,"context_lines":[{"line_number":1473,"context_line":"        rehash_span.close_span()"},{"line_number":1474,"context_line":"        trace.message(f\"hashed: {hashed}\")"},{"line_number":1475,"context_line":"        if modified:"},{"line_number":1476,"context_line":"            lock_span \u003d trace.open_span(\u0027lock\u0027)"},{"line_number":1477,"context_line":"            with lock_path(partition_path):"},{"line_number":1478,"context_line":"                lock_span.close_span()"},{"line_number":1479,"context_line":"                reread_span \u003d trace.open_span(\u0027reread_hashes\u0027)"}],"source_content_type":"text/x-python","patch_set":1,"id":"8ca9ce8b_b89bde5c","line":1476,"in_reply_to":"88938243_f099b394","updated":"2025-07-09 19:23:35.000000000","message":"\u003e important goals [...] is try to keep the original function flow intact\n\nI\u0027m sure that explains some of the difference in design then, I don\u0027t share this goal for this patch.  From my perspective this is a throw away change - I just need to get the fine grained, easy to parse, detailed information out of the reconstructor RIGHT NOW to debug the problem.  Once we\u0027ve fixed get_hashes I don\u0027t care if this merges so I don\u0027t really expect it to be in my way when maintaining the \"function flow\" in the future after I\u0027ve rewritten it.\n\nRegardless... I don\u0027t see the difference in the linked change except that I was more thorough with what I instrumented.\n\nI actually prefer:\n\n```\nconsolidate_trace \u003d trace.open_span(\u0027conslidate_hashes\u0027)\n...\nconsolidate_trace.close()\n```\n\ncompared to\n\n```\ntiming_start \u003d record_slow_timing(\n            timing_breakdown, \u0027consolidate_hashes\u0027, timing_start)\n...\n        timing_start \u003d record_slow_timing(\n            timing_breakdown, \u0027listdir\u0027, timing_start)\n...\n            record_slow_timing(\n                timing_breakdown, \u0027partition_lock_read\u0027, timing_start)\n```\n\nLike does the `conslidate_hashes` really go all the way until `listdir`?  What if it\u0027s invalid - then `conslidate_hashes` stops at `hashes_pkl_read`???  Maybe I\u0027m misunderstanding how great implicit close is  - but for me it seems like it will be harder to reason about the results.","commit_id":"521481f2c9d678cd91650e143cf4f21c919daaed"},{"author":{"_account_id":1179,"name":"Clay Gerrard","email":"clay.gerrard@gmail.com","username":"clay-gerrard"},"change_message_id":"7fecf8326f6429e9ce000202e24e422e457124c1","unresolved":true,"context_lines":[{"line_number":1473,"context_line":"        rehash_span.close_span()"},{"line_number":1474,"context_line":"        trace.message(f\"hashed: {hashed}\")"},{"line_number":1475,"context_line":"        if modified:"},{"line_number":1476,"context_line":"            lock_span \u003d trace.open_span(\u0027lock\u0027)"},{"line_number":1477,"context_line":"            with lock_path(partition_path):"},{"line_number":1478,"context_line":"                lock_span.close_span()"},{"line_number":1479,"context_line":"                reread_span \u003d trace.open_span(\u0027reread_hashes\u0027)"}],"source_content_type":"text/x-python","patch_set":1,"id":"2362ca36_4112411d","line":1476,"in_reply_to":"d4252814_2bba99f8","updated":"2025-07-09 18:01:50.000000000","message":"perfectly reasonable observation - what\u0027s the alternative?\n\n1) only do timing per-method using decorators (i.e. refactor code before you can instrument it)\n2) only have a \"mark next spot\" helper where the `open_span`/`__enter__`/`start` is implicitly the `close_span`/`__exit__`/`end` of the previous span.\n\nI really like the explicit open/close structure of spans because they naturally lead to nesting spans - but when you want to add timing all at one level it might be fine to support alternatively a:\n\n```\ntrace \u003d ...\n# do something fast\ntrace.mark(\u0027point1\u0027)\n# do something slow\ntrace.mark(\u0027point2\u0027)\n```\n\nbut if your \"do something slow\" calls another method - and you end up wanting to instrument *that* method as well - I think it\u0027s better if you had done\n\n\n```\nwith trace.open_span(\u0027call_method\u0027):\n    method()\n```\n\nso that later in method any tracing you add will all come out as `call_method.sub-span`","commit_id":"521481f2c9d678cd91650e143cf4f21c919daaed"},{"author":{"_account_id":34930,"name":"Jianjian Huo","email":"jhuo@nvidia.com","username":"jhuo"},"change_message_id":"7951596d28c7640fcab975d49d18796fe60f4099","unresolved":true,"context_lines":[{"line_number":1218,"context_line":""},{"line_number":1219,"context_line":"        return results"},{"line_number":1220,"context_line":""},{"line_number":1221,"context_line":"    def cleanup_ondisk_files(self, hsh_path, **kwargs):"},{"line_number":1222,"context_line":"        \"\"\""},{"line_number":1223,"context_line":"        Clean up on-disk files that are obsolete and gather the set of valid"},{"line_number":1224,"context_line":"        on-disk files for an object."}],"source_content_type":"text/x-python","patch_set":2,"id":"e07149fb_70679389","line":1221,"updated":"2025-07-14 17:53:12.000000000","message":"need to add four traces within this function, see four debug log lines at: https://review.opendev.org/c/openstack/swift/+/953977/7/swift/obj/diskfile.py#1099","commit_id":"2f1dbb0e5aaa2cfbebc88656aa06fa326b73c24f"},{"author":{"_account_id":1179,"name":"Clay Gerrard","email":"clay.gerrard@gmail.com","username":"clay-gerrard"},"change_message_id":"7f350d1893b2e8cad29fa6cfc77438121a4525ca","unresolved":false,"context_lines":[{"line_number":1218,"context_line":""},{"line_number":1219,"context_line":"        return results"},{"line_number":1220,"context_line":""},{"line_number":1221,"context_line":"    def cleanup_ondisk_files(self, hsh_path, **kwargs):"},{"line_number":1222,"context_line":"        \"\"\""},{"line_number":1223,"context_line":"        Clean up on-disk files that are obsolete and gather the set of valid"},{"line_number":1224,"context_line":"        on-disk files for an object."}],"source_content_type":"text/x-python","patch_set":2,"id":"93dbbd72_a6e193ef","line":1221,"in_reply_to":"e07149fb_70679389","updated":"2025-07-14 20:27:55.000000000","message":"Done","commit_id":"2f1dbb0e5aaa2cfbebc88656aa06fa326b73c24f"},{"author":{"_account_id":1179,"name":"Clay Gerrard","email":"clay.gerrard@gmail.com","username":"clay-gerrard"},"change_message_id":"9bae7e30a69fb610d20b25c19f269d72f828cb24","unresolved":true,"context_lines":[{"line_number":737,"context_line":"        \"\"\""},{"line_number":738,"context_line":"        logger._cls_thread_local.trace \u003d ("},{"line_number":739,"context_line":"            trace"},{"line_number":740,"context_line":"            or getattr(logger._cls_thread_local, \u0027trace\u0027, None)"},{"line_number":741,"context_line":"            or LogTraceContext(name, logger))"},{"line_number":742,"context_line":"        return logger._cls_thread_local.trace"},{"line_number":743,"context_line":""}],"source_content_type":"text/x-python","patch_set":5,"id":"3125ed0f_4ea7a871","line":740,"updated":"2025-07-28 15:48:36.000000000","message":"if this was set w/i the a tpool.execute - it\u0027s likely the thread-id used in the thread-local index is the a long-lived thread thread from the thread-pool (or at least that long lived thread\u0027s main green-thread id) - regardless it\u0027s probably \"recycling\" a pre-existing ephemeral trace object and adding messages to it instead of creating a new one in the statement below and then \"over-writing\" it.","commit_id":"2fc3d2cd4bfa1101f8fc977dbaa4da6ed631a0f6"},{"author":{"_account_id":30073,"name":"Brendan Shephard","email":"bshephar@bne-home.net","username":"bshephar"},"change_message_id":"36915101a999455caf739af6cae2c0fcde67ca86","unresolved":true,"context_lines":[{"line_number":747,"context_line":"        \"\"\""},{"line_number":748,"context_line":"        if getattr(logger._cls_thread_local, \u0027trace\u0027, None):"},{"line_number":749,"context_line":"            return logger._cls_thread_local.trace"},{"line_number":750,"context_line":"        else:"},{"line_number":751,"context_line":"            return LogTraceContext(name, logger)"},{"line_number":752,"context_line":""},{"line_number":753,"context_line":"    def open_span(self, span_name, summary_key\u003dNone):"}],"source_content_type":"text/x-python","patch_set":10,"id":"748cbbcd_3eee15bd","line":750,"updated":"2025-08-07 03:26:03.000000000","message":"nit: The `else` is unnecessary in this case","commit_id":"9689669b47b92d1c3c3f19365276dfa8db6eb967"},{"author":{"_account_id":1179,"name":"Clay Gerrard","email":"clay.gerrard@gmail.com","username":"clay-gerrard"},"change_message_id":"b4ccb2124071c345d152f508f6555af2db33adad","unresolved":true,"context_lines":[{"line_number":1541,"context_line":"                    suffix_span \u003d trace.open_span(f\u0027hash_{suffix}\u0027)"},{"line_number":1542,"context_line":"                    hashes[suffix] \u003d self._hash_suffix("},{"line_number":1543,"context_line":"                        suffix_dir, policy\u003dpolicy)"},{"line_number":1544,"context_line":"                    suffix_span.close_span()"},{"line_number":1545,"context_line":"                    hashed +\u003d 1"},{"line_number":1546,"context_line":"                except PathNotDir:"},{"line_number":1547,"context_line":"                    del hashes[suffix]"}],"source_content_type":"text/x-python","patch_set":10,"id":"bf70690b_763801f0","line":1544,"updated":"2025-09-23 22:59:51.000000000","message":"this is wrong too!  we have to close_span even in the PathNotDir/OSError paths!","commit_id":"9689669b47b92d1c3c3f19365276dfa8db6eb967"},{"author":{"_account_id":1179,"name":"Clay Gerrard","email":"clay.gerrard@gmail.com","username":"clay-gerrard"},"change_message_id":"99159c6ca31ee9f8274fef5b8509b5128fab03dc","unresolved":false,"context_lines":[{"line_number":1541,"context_line":"                    suffix_span \u003d trace.open_span(f\u0027hash_{suffix}\u0027)"},{"line_number":1542,"context_line":"                    hashes[suffix] \u003d self._hash_suffix("},{"line_number":1543,"context_line":"                        suffix_dir, policy\u003dpolicy)"},{"line_number":1544,"context_line":"                    suffix_span.close_span()"},{"line_number":1545,"context_line":"                    hashed +\u003d 1"},{"line_number":1546,"context_line":"                except PathNotDir:"},{"line_number":1547,"context_line":"                    del hashes[suffix]"}],"source_content_type":"text/x-python","patch_set":10,"id":"efc0309c_8d452fca","line":1544,"in_reply_to":"bf70690b_763801f0","updated":"2025-09-24 19:07:31.000000000","message":"Done","commit_id":"9689669b47b92d1c3c3f19365276dfa8db6eb967"},{"author":{"_account_id":1179,"name":"Clay Gerrard","email":"clay.gerrard@gmail.com","username":"clay-gerrard"},"change_message_id":"fd98e04c7b5fb94869568097efd9ba72145fe817","unresolved":true,"context_lines":[{"line_number":1553,"context_line":"        if modified:"},{"line_number":1554,"context_line":"            lock_span \u003d trace.open_span(\u0027lock\u0027)"},{"line_number":1555,"context_line":"            with lock_path(partition_path):"},{"line_number":1556,"context_line":"                lock_span.close_span()"},{"line_number":1557,"context_line":"                reread_span \u003d trace.open_span(\u0027reread_hashes\u0027)"},{"line_number":1558,"context_line":"                if read_hashes(partition_path) \u003d\u003d orig_hashes:"},{"line_number":1559,"context_line":"                    reread_span.close_span()"}],"source_content_type":"text/x-python","patch_set":10,"id":"8ef8bcff_74b0a289","line":1556,"updated":"2025-09-23 17:27:55.000000000","message":"there should probably be a path for closing this on the timeout.","commit_id":"9689669b47b92d1c3c3f19365276dfa8db6eb967"},{"author":{"_account_id":1179,"name":"Clay Gerrard","email":"clay.gerrard@gmail.com","username":"clay-gerrard"},"change_message_id":"99159c6ca31ee9f8274fef5b8509b5128fab03dc","unresolved":false,"context_lines":[{"line_number":1553,"context_line":"        if modified:"},{"line_number":1554,"context_line":"            lock_span \u003d trace.open_span(\u0027lock\u0027)"},{"line_number":1555,"context_line":"            with lock_path(partition_path):"},{"line_number":1556,"context_line":"                lock_span.close_span()"},{"line_number":1557,"context_line":"                reread_span \u003d trace.open_span(\u0027reread_hashes\u0027)"},{"line_number":1558,"context_line":"                if read_hashes(partition_path) \u003d\u003d orig_hashes:"},{"line_number":1559,"context_line":"                    reread_span.close_span()"}],"source_content_type":"text/x-python","patch_set":10,"id":"a998a479_f35f6bad","line":1556,"in_reply_to":"8ef8bcff_74b0a289","updated":"2025-09-24 19:07:31.000000000","message":"Done","commit_id":"9689669b47b92d1c3c3f19365276dfa8db6eb967"},{"author":{"_account_id":1179,"name":"Clay Gerrard","email":"clay.gerrard@gmail.com","username":"clay-gerrard"},"change_message_id":"fd98e04c7b5fb94869568097efd9ba72145fe817","unresolved":true,"context_lines":[{"line_number":1556,"context_line":"                lock_span.close_span()"},{"line_number":1557,"context_line":"                reread_span \u003d trace.open_span(\u0027reread_hashes\u0027)"},{"line_number":1558,"context_line":"                if read_hashes(partition_path) \u003d\u003d orig_hashes:"},{"line_number":1559,"context_line":"                    reread_span.close_span()"},{"line_number":1560,"context_line":"                    write_span \u003d trace.open_span(\u0027write_hashes\u0027)"},{"line_number":1561,"context_line":"                    write_hashes(partition_path, hashes)"},{"line_number":1562,"context_line":"                    write_span.close_span()"}],"source_content_type":"text/x-python","patch_set":10,"id":"ec51bb2c_1ad54898","line":1559,"updated":"2025-09-23 17:27:55.000000000","message":"this can\u0027t be correct - how can we only close this span *conditionally*","commit_id":"9689669b47b92d1c3c3f19365276dfa8db6eb967"},{"author":{"_account_id":1179,"name":"Clay Gerrard","email":"clay.gerrard@gmail.com","username":"clay-gerrard"},"change_message_id":"99159c6ca31ee9f8274fef5b8509b5128fab03dc","unresolved":false,"context_lines":[{"line_number":1556,"context_line":"                lock_span.close_span()"},{"line_number":1557,"context_line":"                reread_span \u003d trace.open_span(\u0027reread_hashes\u0027)"},{"line_number":1558,"context_line":"                if read_hashes(partition_path) \u003d\u003d orig_hashes:"},{"line_number":1559,"context_line":"                    reread_span.close_span()"},{"line_number":1560,"context_line":"                    write_span \u003d trace.open_span(\u0027write_hashes\u0027)"},{"line_number":1561,"context_line":"                    write_hashes(partition_path, hashes)"},{"line_number":1562,"context_line":"                    write_span.close_span()"}],"source_content_type":"text/x-python","patch_set":10,"id":"420d2be9_32284d1d","line":1559,"in_reply_to":"ec51bb2c_1ad54898","updated":"2025-09-24 19:07:31.000000000","message":"Done","commit_id":"9689669b47b92d1c3c3f19365276dfa8db6eb967"},{"author":{"_account_id":1179,"name":"Clay Gerrard","email":"clay.gerrard@gmail.com","username":"clay-gerrard"},"change_message_id":"99159c6ca31ee9f8274fef5b8509b5128fab03dc","unresolved":true,"context_lines":[{"line_number":1358,"context_line":"            try:"},{"line_number":1359,"context_line":"                with trace.open_span(\u0027cleanup_ondisk_files\u0027):"},{"line_number":1360,"context_line":"                    ondisk_info \u003d self.cleanup_ondisk_files("},{"line_number":1361,"context_line":"                        hsh_path, policy\u003dpolicy)"},{"line_number":1362,"context_line":"            except OSError as err:"},{"line_number":1363,"context_line":"                partition_path \u003d dirname(path)"},{"line_number":1364,"context_line":"                objects_path \u003d dirname(partition_path)"}],"source_content_type":"text/x-python","patch_set":11,"id":"3e70a308_bdfc6743","line":1361,"updated":"2025-09-24 19:07:31.000000000","message":"cursor found this one for me!","commit_id":"d6ba532d3c8f614207934e734ade92b665c638b5"},{"author":{"_account_id":1179,"name":"Clay Gerrard","email":"clay.gerrard@gmail.com","username":"clay-gerrard"},"change_message_id":"99159c6ca31ee9f8274fef5b8509b5128fab03dc","unresolved":true,"context_lines":[{"line_number":1562,"context_line":"                            write_hashes(partition_path, hashes)"},{"line_number":1563,"context_line":"                        return hashed, hashes"},{"line_number":1564,"context_line":"            finally:"},{"line_number":1565,"context_line":"                if not closed_lock_span:"},{"line_number":1566,"context_line":"                    lock_span.close_span()"},{"line_number":1567,"context_line":"            with trace.open_span(\u0027rehash_again\u0027):"},{"line_number":1568,"context_line":"                return self.__get_hashes(device, partition, policy,"}],"source_content_type":"text/x-python","patch_set":11,"id":"98a8975a_c1de4ef4","line":1565,"updated":"2025-09-24 19:07:31.000000000","message":"it would be nice if open_span returned an object that wrapped the trace that handled this state tracking for you: `if not self.closed: self._trace.pop_span()`\n\nI think locks specifically will want a kind of trace context spans \"time to enter context\" on \"successfully got lock\" but the whole \"time spent trying to get in the lock\" on \"failed to get lock\"","commit_id":"d6ba532d3c8f614207934e734ade92b665c638b5"}],"swift/obj/reconstructor.py":[{"author":{"_account_id":7847,"name":"Alistair Coles","email":"alistairncoles@gmail.com","username":"acoles"},"change_message_id":"0d1aa72a9ac19a9b933108a18c004092d4072c01","unresolved":true,"context_lines":[{"line_number":836,"context_line":"        hashed, suffix_hashes \u003d tpool.execute("},{"line_number":837,"context_line":"            df_mgr._get_hashes, device, partition, policy,"},{"line_number":838,"context_line":"            recalculate\u003drecalculate, do_listdir\u003ddo_listdir, trace\u003dtrace)"},{"line_number":839,"context_line":"        trace.log(self.logger)"},{"line_number":840,"context_line":"        # hashed is always an int, we count it in stats but don\u0027t return it"},{"line_number":841,"context_line":"        self.logger.update_stats(\u0027suffix.hashes\u0027, hashed)"},{"line_number":842,"context_line":"        return suffix_hashes"}],"source_content_type":"text/x-python","patch_set":6,"id":"cb333ee7_9a0e67a8","line":839,"updated":"2025-07-28 11:51:02.000000000","message":"right, this would always clear the trace","commit_id":"809fc8df1030873788deb2dbc2762be578c65c7b"},{"author":{"_account_id":30073,"name":"Brendan Shephard","email":"bshephar@bne-home.net","username":"bshephar"},"change_message_id":"36915101a999455caf739af6cae2c0fcde67ca86","unresolved":true,"context_lines":[{"line_number":831,"context_line":"    def _get_hashes(self, device, partition, policy, recalculate\u003dNone,"},{"line_number":832,"context_line":"                    do_listdir\u003dFalse):"},{"line_number":833,"context_line":"        df_mgr \u003d self._df_router[policy]"},{"line_number":834,"context_line":"        # don\u0027t pass logger, we\u0027ll log at the end outside of the tpool"},{"line_number":835,"context_line":"        trace \u003d LogTraceContext(\u0027reconstructor._get_hashes\u0027)"},{"line_number":836,"context_line":"        hashed, suffix_hashes \u003d tpool.execute("},{"line_number":837,"context_line":"            df_mgr._get_hashes, device, partition, policy,"}],"source_content_type":"text/x-python","patch_set":10,"id":"b7b45252_02d4f56b","line":834,"updated":"2025-08-07 03:26:03.000000000","message":"Right, this correlates to your comment in diskfile.py:802-803?\n```\n        N.B. I would not recommend a bunch of logging from inside a tpool, that\n        be the way to deadlocks and grimlins.\n```\n\nBecause you would end up with multiple threads trying to write to the logger and it would be a mess.","commit_id":"9689669b47b92d1c3c3f19365276dfa8db6eb967"}],"swift/obj/replicator.py":[{"author":{"_account_id":7847,"name":"Alistair Coles","email":"alistairncoles@gmail.com","username":"acoles"},"change_message_id":"0d1aa72a9ac19a9b933108a18c004092d4072c01","unresolved":true,"context_lines":[{"line_number":726,"context_line":"                    hashed, recalc_hash \u003d tpool.execute("},{"line_number":727,"context_line":"                        df_mgr._get_hashes,"},{"line_number":728,"context_line":"                        job[\u0027device\u0027], job[\u0027partition\u0027], job[\u0027policy\u0027],"},{"line_number":729,"context_line":"                        recalculate\u003dsuffixes, trace\u003dtrace)"},{"line_number":730,"context_line":"                    self.logger.update_stats(\u0027suffix.hashes\u0027, hashed)"},{"line_number":731,"context_line":"                    local_hash \u003d recalc_hash"},{"line_number":732,"context_line":"                    suffixes \u003d [suffix for suffix in local_hash if"}],"source_content_type":"text/x-python","patch_set":6,"id":"e4abfd1b_b62e74c0","line":729,"updated":"2025-07-28 11:51:02.000000000","message":"this is going to accumulate on the same trace, which may be useful to get the context of the retries","commit_id":"809fc8df1030873788deb2dbc2762be578c65c7b"},{"author":{"_account_id":7847,"name":"Alistair Coles","email":"alistairncoles@gmail.com","username":"acoles"},"change_message_id":"0d1aa72a9ac19a9b933108a18c004092d4072c01","unresolved":true,"context_lines":[{"line_number":751,"context_line":"                    self.logger.exception(\"Error syncing with node: %s\","},{"line_number":752,"context_line":"                                          node_str)"},{"line_number":753,"context_line":"            stats.suffix_count +\u003d len(local_hash)"},{"line_number":754,"context_line":"            # this has the side-effect of relasing the trace obj from the"},{"line_number":755,"context_line":"            # thread-local"},{"line_number":756,"context_line":"            trace.log(self.logger)"},{"line_number":757,"context_line":"        except StopIteration:"},{"line_number":758,"context_line":"            self.logger.error(\u0027Ran out of handoffs while replicating \u0027"}],"source_content_type":"text/x-python","patch_set":6,"id":"8e812baa_b60478f5","line":755,"range":{"start_line":754,"start_character":42,"end_line":755,"end_character":26},"updated":"2025-07-28 11:51:02.000000000","message":"should this be in the finally to make sure the thread-local is cleared *always* ?","commit_id":"809fc8df1030873788deb2dbc2762be578c65c7b"},{"author":{"_account_id":1179,"name":"Clay Gerrard","email":"clay.gerrard@gmail.com","username":"clay-gerrard"},"change_message_id":"9bae7e30a69fb610d20b25c19f269d72f828cb24","unresolved":true,"context_lines":[{"line_number":751,"context_line":"                    self.logger.exception(\"Error syncing with node: %s\","},{"line_number":752,"context_line":"                                          node_str)"},{"line_number":753,"context_line":"            stats.suffix_count +\u003d len(local_hash)"},{"line_number":754,"context_line":"            # this has the side-effect of relasing the trace obj from the"},{"line_number":755,"context_line":"            # thread-local"},{"line_number":756,"context_line":"            trace.log(self.logger)"},{"line_number":757,"context_line":"        except StopIteration:"},{"line_number":758,"context_line":"            self.logger.error(\u0027Ran out of handoffs while replicating \u0027"}],"source_content_type":"text/x-python","patch_set":6,"id":"cf8d4fcc_ec86a023","line":755,"range":{"start_line":754,"start_character":42,"end_line":755,"end_character":26},"in_reply_to":"8e812baa_b60478f5","updated":"2025-07-28 15:48:36.000000000","message":"yeah it looks like it\u0027d be easy enough to move it - I\u0027ll move it \"just in case\" since I have to respin the patch for pep8 anyway.\n\nI hope that it shouldn\u0027t be *necessary* b/c the creation call on L664 will *never* re-use an existing trace object.  I think if this object is so dangerous we get the point of calling it\u0027s \"close\" method in finally blocks 1) it\u0027s probably written incorrect (that turned out to be true, see the diff in get_trace) or 2) has a bad interface, i.e. the method should be named `close` and the primary usage should be a contextmanager.","commit_id":"809fc8df1030873788deb2dbc2762be578c65c7b"},{"author":{"_account_id":7847,"name":"Alistair Coles","email":"alistairncoles@gmail.com","username":"acoles"},"change_message_id":"0d1aa72a9ac19a9b933108a18c004092d4072c01","unresolved":true,"context_lines":[{"line_number":753,"context_line":"            stats.suffix_count +\u003d len(local_hash)"},{"line_number":754,"context_line":"            # this has the side-effect of relasing the trace obj from the"},{"line_number":755,"context_line":"            # thread-local"},{"line_number":756,"context_line":"            trace.log(self.logger)"},{"line_number":757,"context_line":"        except StopIteration:"},{"line_number":758,"context_line":"            self.logger.error(\u0027Ran out of handoffs while replicating \u0027"},{"line_number":759,"context_line":"                              \u0027partition %s of policy %d\u0027,"}],"source_content_type":"text/x-python","patch_set":6,"id":"fb8dda8b_8afd56d5","line":756,"updated":"2025-07-28 11:51:02.000000000","message":"so IIUC the memory leak bug was because the replicator never actually called trace.log\n\nThis test patch fails on patchset 5 because the spans just keep on accumulating...\n```\ndiff --git a/test/unit/obj/test_replicator.py b/test/unit/obj/test_replicator.py\nindex d31c01df2..9e5539beb 100644\n--- a/test/unit/obj/test_replicator.py\n+++ b/test/unit/obj/test_replicator.py\n@@ -29,8 +29,9 @@ from collections import defaultdict\n from errno import ENOENT, ENOTEMPTY, ENOTDIR\n \n from eventlet.green import subprocess\n-from eventlet import Timeout, sleep\n+from eventlet import Timeout, sleep, tpool\n \n+from swift.obj.diskfile import LogTraceContext\n from test.debug_logger import debug_logger\n from test.unit import (patch_policies, make_timestamp_iter, mocked_http_conn,\n                        mock_check_drive, skip_if_no_xattrs)\n@@ -382,6 +383,7 @@ class TestObjectReplicator(unittest.TestCase):\n         conf \u003d dict(swift_dir\u003dself.testdir, devices\u003dself.devices,\n                     bind_ip\u003d_ips()[0], recon_cache_path\u003dself.recon_cache,\n                     mount_check\u003d\u0027false\u0027, timeout\u003d\u0027300\u0027, stats_interval\u003d\u00271\u0027)\n+        tpool.set_num_threads(1)\n         replicator \u003d object_replicator.ObjectReplicator(conf,\n                                                         logger\u003dself.logger)\n         was_connector \u003d object_replicator.http_connect\n@@ -423,12 +425,26 @@ class TestObjectReplicator(unittest.TestCase):\n             while True:\n                 yield 60\n \n+        trace_spans \u003d []\n+\n+        def capture_trace():\n+            msgs \u003d LogTraceContext.get_trace(\u0027test\u0027, self.logger).messages\n+            for msg in msgs:\n+                span, took, ms \u003d msg.partition(\u0027took\u0027)\n+                trace_spans.append(span.strip())\n+\n+        tpool.execute(capture_trace)\n+        self.assertLessEqual(len(trace_spans), 44)\n+\n         for cycle in range(1, 10):\n+            trace_spans \u003d []\n             with _mock_process(process_arg_checker):\n                 with mock.patch(\u0027time.time\u0027, side_effect\u003d_infinite_gen()):\n                     replicator.run_once()\n                     self.assertEqual((start + 1 + cycle) % 10,\n                                      replicator.replication_cycle)\n+            tpool.execute(capture_trace)\n+            self.assertLessEqual(len(trace_spans), 44)\n \n         recon_fname \u003d os.path.join(self.recon_cache, RECON_OBJECT_FILE)\n         with open(recon_fname) as cachefile:\n\n```","commit_id":"809fc8df1030873788deb2dbc2762be578c65c7b"},{"author":{"_account_id":1179,"name":"Clay Gerrard","email":"clay.gerrard@gmail.com","username":"clay-gerrard"},"change_message_id":"9bae7e30a69fb610d20b25c19f269d72f828cb24","unresolved":true,"context_lines":[{"line_number":753,"context_line":"            stats.suffix_count +\u003d len(local_hash)"},{"line_number":754,"context_line":"            # this has the side-effect of relasing the trace obj from the"},{"line_number":755,"context_line":"            # thread-local"},{"line_number":756,"context_line":"            trace.log(self.logger)"},{"line_number":757,"context_line":"        except StopIteration:"},{"line_number":758,"context_line":"            self.logger.error(\u0027Ran out of handoffs while replicating \u0027"},{"line_number":759,"context_line":"                              \u0027partition %s of policy %d\u0027,"}],"source_content_type":"text/x-python","patch_set":6,"id":"0e67eb3b_632894e4","line":756,"in_reply_to":"fb8dda8b_8afd56d5","updated":"2025-07-28 15:48:36.000000000","message":"I didn\u0027t understand that the tpool\u0027s \"creation\" of a \"ephemeral\" log-trace object would actually just pull the one from the previous `tpool.execute` (!!) - that previous `get_trace` implementation was super wrong it turned out.\n\nI think a test method sort of like this could be used with the new get_trace method to prove that we don\u0027t leak memory in the tpool green-thread even if the caller doesn\u0027t call `trace.log`","commit_id":"809fc8df1030873788deb2dbc2762be578c65c7b"}]}
