)]}'
{"/COMMIT_MSG":[{"author":{"_account_id":15343,"name":"Tim Burke","email":"tburke@nvidia.com","username":"tburke"},"change_message_id":"8626e0f7cad88a444a70e7e9e5a7a4817795889e","unresolved":true,"context_lines":[{"line_number":10,"context_line":"encode and decode methods. The telemetry is labeled with the policy name"},{"line_number":11,"context_line":"and backend EC implementation algorithm used by liberasurecode. This"},{"line_number":12,"context_line":"helps set benchmark baselines and compare performance across different"},{"line_number":13,"context_line":"backends and hardware platforms."},{"line_number":14,"context_line":""},{"line_number":15,"context_line":"Change-Id: Ibe1204f9ddf44348ff2b9d972bfd7acab0a79a64"},{"line_number":16,"context_line":"Signed-off-by: Wael Halbawi \u003cwhalbawi@nvidia.com\u003e"}],"source_content_type":"text/x-gerrit-commit-message","patch_set":1,"id":"98e7bf05_b4372336","line":13,"updated":"2026-05-05 21:00:44.000000000","message":"How sure are we that the existing bandwidth metrics aren\u0027t sufficient? I take it the concern is that things like `swift_proxy_server_request_timing` get dominated by waiting on backends responses?\n\nKind of makes me want some timings dict in the request env to show how things break down between\n- waiting on backends\n- doing EC\n- doing encryption\n- doing some sleep for rate-limiting (if we still do that; I know the in-tree ratelimit middleware [will do that](https://github.com/openstack/swift/blob/2.37.1/swift/common/middleware/ratelimit.py#L283), but IIRC what we\u0027re using in prod doesn\u0027t)\n\nAnd of course the overall timing, so we know what isn\u0027t getting covered yet.\n\nMakes me think of Matt\u0027s [request tracing work](https://review.opendev.org/c/openstack/swift/+/899397)...","commit_id":"7708020fbc2cb865de0956b2ae5be49216b786a6"},{"author":{"_account_id":1179,"name":"Clay Gerrard","email":"clay.gerrard@gmail.com","username":"clay-gerrard"},"change_message_id":"931d9868777d8f8884f5df5430763685b8ed5850","unresolved":true,"context_lines":[{"line_number":10,"context_line":"encode and decode methods. The telemetry is labeled with the policy name"},{"line_number":11,"context_line":"and backend EC implementation algorithm used by liberasurecode. This"},{"line_number":12,"context_line":"helps set benchmark baselines and compare performance across different"},{"line_number":13,"context_line":"backends and hardware platforms."},{"line_number":14,"context_line":""},{"line_number":15,"context_line":"Change-Id: Ibe1204f9ddf44348ff2b9d972bfd7acab0a79a64"},{"line_number":16,"context_line":"Signed-off-by: Wael Halbawi \u003cwhalbawi@nvidia.com\u003e"}],"source_content_type":"text/x-gerrit-commit-message","patch_set":1,"id":"09873bf3_fd5c34ba","line":13,"in_reply_to":"3f23765f_79db1b10","updated":"2026-05-07 16:39:37.000000000","message":"\u003e makes me want some timings dict in the request env\n\n^ I think this is a reasonable like of thinking!\n\n\u003e Special cases aren\u0027t special enough to break the rules.\n\nThis isn\u0027t the ONLY thing that\u0027s getting \"buried\" in our single \"total throughput\" measurement - stuff like \"encryption\" could be equally useful to break out.\n\n\u003e Although practicality beats purity.\n\nMaybe if we do this and find it super useful we discover we want to do the same thing to encryption and that\u0027s good enough.\n\nHowever, IME there\u0027s only 3 numbers in computer science: 0, 1, N.\n\nIf we think \"we don\u0027t want to use statsd for intra-req timing sampling\" I believe that (0).\n\nIf we think \"we want intra-req timing for EC-encode and encryption and probably others\" I believe that (N) - but worry ad-hoc instrumentation and statsd isn\u0027t quite \"generic enough\" for what we\u0027re trying to capture/record/analyze.\n\nI don\u0027t think we should think \"we only want intra-req timing for EC and then we\u0027re done\" (1)\n\n... but maybe it\u0027s better do *something* than nothing.","commit_id":"7708020fbc2cb865de0956b2ae5be49216b786a6"},{"author":{"_account_id":38767,"name":"Wael Halbawi","display_name":"Wael Halbawi","email":"whalbawi@nvidia.com","username":"whalbawi"},"change_message_id":"daade1c2f41eba349b980ae10fd8097a1358753e","unresolved":true,"context_lines":[{"line_number":10,"context_line":"encode and decode methods. The telemetry is labeled with the policy name"},{"line_number":11,"context_line":"and backend EC implementation algorithm used by liberasurecode. This"},{"line_number":12,"context_line":"helps set benchmark baselines and compare performance across different"},{"line_number":13,"context_line":"backends and hardware platforms."},{"line_number":14,"context_line":""},{"line_number":15,"context_line":"Change-Id: Ibe1204f9ddf44348ff2b9d972bfd7acab0a79a64"},{"line_number":16,"context_line":"Signed-off-by: Wael Halbawi \u003cwhalbawi@nvidia.com\u003e"}],"source_content_type":"text/x-gerrit-commit-message","patch_set":1,"id":"3f23765f_79db1b10","line":13,"in_reply_to":"98e7bf05_b4372336","updated":"2026-05-06 20:22:51.000000000","message":"\u003e get dominated by waiting on backends responses?\n\nThat\u0027s my guess as well.","commit_id":"7708020fbc2cb865de0956b2ae5be49216b786a6"}],"/PATCHSET_LEVEL":[{"author":{"_account_id":1179,"name":"Clay Gerrard","email":"clay.gerrard@gmail.com","username":"clay-gerrard"},"change_message_id":"931d9868777d8f8884f5df5430763685b8ed5850","unresolved":false,"context_lines":[],"source_content_type":"","patch_set":1,"id":"b5e6739a_cdef31e1","updated":"2026-05-07 16:39:37.000000000","message":"I think Tim has some questions about strategy as well some comments on tactics.\n\nHaving multiple people thinking about this certainly makes me more excited that we should try to do something to better understand our ec-encode throughput.","commit_id":"7708020fbc2cb865de0956b2ae5be49216b786a6"},{"author":{"_account_id":38767,"name":"Wael Halbawi","display_name":"Wael Halbawi","email":"whalbawi@nvidia.com","username":"whalbawi"},"change_message_id":"daade1c2f41eba349b980ae10fd8097a1358753e","unresolved":false,"context_lines":[],"source_content_type":"","patch_set":1,"id":"b10718fe_902637aa","updated":"2026-05-06 20:22:51.000000000","message":"Thanks for the feedback @tburke@nvidia.com!","commit_id":"7708020fbc2cb865de0956b2ae5be49216b786a6"}],"swift/proxy/controllers/obj.py":[{"author":{"_account_id":15343,"name":"Tim Burke","email":"tburke@nvidia.com","username":"tburke"},"change_message_id":"8626e0f7cad88a444a70e7e9e5a7a4817795889e","unresolved":true,"context_lines":[{"line_number":1271,"context_line":"        self.logger \u003d logger"},{"line_number":1272,"context_line":"        self.statsd \u003d statsd"},{"line_number":1273,"context_line":"        self.metric_labels \u003d {"},{"line_number":1274,"context_line":"            \"scheme\": f\"{policy.name}:{policy.ec_type}\""},{"line_number":1275,"context_line":"        }"},{"line_number":1276,"context_line":""},{"line_number":1277,"context_line":"        self.mime_boundary \u003d None"}],"source_content_type":"text/x-python","patch_set":1,"id":"2a98562f_34beb5e3","line":1274,"updated":"2026-05-05 21:00:44.000000000","message":"Are we sure we don\u0027t want these to just be two separate labels? `policy_name` and `ec_type`, say? (Or even go by policy index -- I should go look at what proxy-logging does, for example...)","commit_id":"7708020fbc2cb865de0956b2ae5be49216b786a6"},{"author":{"_account_id":38767,"name":"Wael Halbawi","display_name":"Wael Halbawi","email":"whalbawi@nvidia.com","username":"whalbawi"},"change_message_id":"daade1c2f41eba349b980ae10fd8097a1358753e","unresolved":true,"context_lines":[{"line_number":1271,"context_line":"        self.logger \u003d logger"},{"line_number":1272,"context_line":"        self.statsd \u003d statsd"},{"line_number":1273,"context_line":"        self.metric_labels \u003d {"},{"line_number":1274,"context_line":"            \"scheme\": f\"{policy.name}:{policy.ec_type}\""},{"line_number":1275,"context_line":"        }"},{"line_number":1276,"context_line":""},{"line_number":1277,"context_line":"        self.mime_boundary \u003d None"}],"source_content_type":"text/x-python","patch_set":1,"id":"399c4d61_b91f9abb","line":1274,"in_reply_to":"2a98562f_34beb5e3","updated":"2026-05-06 20:22:51.000000000","message":"I don\u0027t think using two labels buys us much. The (en|de)code cost is a function of the EC parameters and aggregating metrics across different `policy_name`s for a fixed `ec_type` doesn\u0027t seem to be useful. There\u0027s also the cost of maintaining a unique time series for each tuple of unique label values.\n\n\u003e Or even go by policy index\n\nThis does carry all the information we need. The benefit of embedding `ec_type` is being able to see two different series on the same graph when testing various backends.","commit_id":"7708020fbc2cb865de0956b2ae5be49216b786a6"},{"author":{"_account_id":15343,"name":"Tim Burke","email":"tburke@nvidia.com","username":"tburke"},"change_message_id":"331a0e2fe1611bee139219c3b942aed7e6bb1804","unresolved":true,"context_lines":[{"line_number":1271,"context_line":"        self.logger \u003d logger"},{"line_number":1272,"context_line":"        self.statsd \u003d statsd"},{"line_number":1273,"context_line":"        self.metric_labels \u003d {"},{"line_number":1274,"context_line":"            \"scheme\": f\"{policy.name}:{policy.ec_type}\""},{"line_number":1275,"context_line":"        }"},{"line_number":1276,"context_line":""},{"line_number":1277,"context_line":"        self.mime_boundary \u003d None"}],"source_content_type":"text/x-python","patch_set":1,"id":"ec0c7df2_513474d7","line":1274,"in_reply_to":"399c4d61_b91f9abb","updated":"2026-05-06 21:35:59.000000000","message":"\u003e There\u0027s also the cost of maintaining a unique time series for each tuple of unique label values.\n\nBut the cardinality of `(f\"{policy.name}:{policy.ec_type}\", )` is the same as the cardinality of `(policy.name, policy.ec_type)` -- and having them already separated should simplify the labelling of certain types of graphs.\n\n\u003e\u003e Or even go by policy index\n\u003e\n\u003e This does carry all the information we need. The benefit of embedding `ec_type` is being able to see two different series on the same graph when testing various backends.\n\nI actually only meant to recommend using policy index in place of policy name -- names can theoretically change over time (though I think in practice they rarely do), while the policy index (and EC parameters) will not. Agreed that having `ec_type` information directly in the stats is valuable -- we might even want to include `ec_num_data_fragments`/`ec_num_parity_fragments` in labels, too.","commit_id":"7708020fbc2cb865de0956b2ae5be49216b786a6"},{"author":{"_account_id":1179,"name":"Clay Gerrard","email":"clay.gerrard@gmail.com","username":"clay-gerrard"},"change_message_id":"931d9868777d8f8884f5df5430763685b8ed5850","unresolved":true,"context_lines":[{"line_number":1271,"context_line":"        self.logger \u003d logger"},{"line_number":1272,"context_line":"        self.statsd \u003d statsd"},{"line_number":1273,"context_line":"        self.metric_labels \u003d {"},{"line_number":1274,"context_line":"            \"scheme\": f\"{policy.name}:{policy.ec_type}\""},{"line_number":1275,"context_line":"        }"},{"line_number":1276,"context_line":""},{"line_number":1277,"context_line":"        self.mime_boundary \u003d None"}],"source_content_type":"text/x-python","patch_set":1,"id":"619dc400_8de1a935","line":1274,"in_reply_to":"ec0c7df2_513474d7","updated":"2026-05-07 16:39:37.000000000","message":"\u003e having ec_type information directly in the stats is valuable\n\nonly if you run with N different ec_type and want to aggregate across them\n\n\u003e include ec_num_data_fragments/ec_num_parity_fragments in labels\n\nsomewhat skeptical, unless you want to aggregate across multiple polices with similar schema: like \"avg of p90 of both isa-l \u0026 libec 8:4\" (as opposed to *compare* avg of p90 from policy1 vs policy2)\n\n... but ultimately there\u0027s no cardinality concern - policy_idx is always 1:1 with all these other descriptive labels - the issue would only come if if you tried to USE the labels and forgot that policy_idx:1 isn\u0027t the ONLY policy with ec_ndata:8 and you end up looking at a graph aggregating more than you expect/want.","commit_id":"7708020fbc2cb865de0956b2ae5be49216b786a6"},{"author":{"_account_id":15343,"name":"Tim Burke","email":"tburke@nvidia.com","username":"tburke"},"change_message_id":"8626e0f7cad88a444a70e7e9e5a7a4817795889e","unresolved":true,"context_lines":[{"line_number":1670,"context_line":"                try:"},{"line_number":1671,"context_line":"                    decode_start \u003d time.time()"},{"line_number":1672,"context_line":"                    segment \u003d self.policy.pyeclib_driver.decode(fragments)"},{"line_number":1673,"context_line":"                    self.statsd.transfer_rate(\"swift_proxy_ec_decode_inv_tput\","},{"line_number":1674,"context_line":"                                              time.time() - decode_start,"},{"line_number":1675,"context_line":"                                              len(segment),"},{"line_number":1676,"context_line":"                                              labels\u003dself.metric_labels)"}],"source_content_type":"text/x-python","patch_set":1,"id":"e8d3f414_0e7fc641","line":1673,"updated":"2026-05-05 21:00:44.000000000","message":"It\u0027s definitely a little worrying that we do this for every chunk -- the only other place we use `transfer_rate` today is in the object-server PUT path, and that\u0027s only *once per request*.\n\nproxy-logging *does* have some stats that (can) get emitted multiple times per request following https://github.com/openstack/swift/commit/dcd5a265 -- but\n\n- it\u0027s careful to only do that every now and then, not for every chunk, and\n- it can be turned off entirely (indeed, that\u0027s the default)","commit_id":"7708020fbc2cb865de0956b2ae5be49216b786a6"},{"author":{"_account_id":38767,"name":"Wael Halbawi","display_name":"Wael Halbawi","email":"whalbawi@nvidia.com","username":"whalbawi"},"change_message_id":"daade1c2f41eba349b980ae10fd8097a1358753e","unresolved":true,"context_lines":[{"line_number":1670,"context_line":"                try:"},{"line_number":1671,"context_line":"                    decode_start \u003d time.time()"},{"line_number":1672,"context_line":"                    segment \u003d self.policy.pyeclib_driver.decode(fragments)"},{"line_number":1673,"context_line":"                    self.statsd.transfer_rate(\"swift_proxy_ec_decode_inv_tput\","},{"line_number":1674,"context_line":"                                              time.time() - decode_start,"},{"line_number":1675,"context_line":"                                              len(segment),"},{"line_number":1676,"context_line":"                                              labels\u003dself.metric_labels)"}],"source_content_type":"text/x-python","patch_set":1,"id":"ee42ca35_06d27026","line":1673,"in_reply_to":"e8d3f414_0e7fc641","updated":"2026-05-06 20:22:51.000000000","message":"I think we can aggregate over the chunks, either an average or some tail-end quantile, and emit a single point per request.","commit_id":"7708020fbc2cb865de0956b2ae5be49216b786a6"},{"author":{"_account_id":15343,"name":"Tim Burke","email":"tburke@nvidia.com","username":"tburke"},"change_message_id":"331a0e2fe1611bee139219c3b942aed7e6bb1804","unresolved":true,"context_lines":[{"line_number":1670,"context_line":"                try:"},{"line_number":1671,"context_line":"                    decode_start \u003d time.time()"},{"line_number":1672,"context_line":"                    segment \u003d self.policy.pyeclib_driver.decode(fragments)"},{"line_number":1673,"context_line":"                    self.statsd.transfer_rate(\"swift_proxy_ec_decode_inv_tput\","},{"line_number":1674,"context_line":"                                              time.time() - decode_start,"},{"line_number":1675,"context_line":"                                              len(segment),"},{"line_number":1676,"context_line":"                                              labels\u003dself.metric_labels)"}],"source_content_type":"text/x-python","patch_set":1,"id":"1919ab1c_3b2005ec","line":1673,"in_reply_to":"ee42ca35_06d27026","updated":"2026-05-06 21:35:59.000000000","message":"\u003e an average\n\nLike, `sum(time spend on decode) / sum(segments lengths)`? Seems reasonable.","commit_id":"7708020fbc2cb865de0956b2ae5be49216b786a6"},{"author":{"_account_id":15343,"name":"Tim Burke","email":"tburke@nvidia.com","username":"tburke"},"change_message_id":"8626e0f7cad88a444a70e7e9e5a7a4817795889e","unresolved":true,"context_lines":[{"line_number":2147,"context_line":""},{"line_number":2148,"context_line":"    metric_name \u003d \"swift_proxy_ec_encode_inv_tput\""},{"line_number":2149,"context_line":"    metric_labels \u003d {"},{"line_number":2150,"context_line":"        \"scheme\": f\"{policy.name}:{policy.ec_type}\""},{"line_number":2151,"context_line":"    }"},{"line_number":2152,"context_line":""},{"line_number":2153,"context_line":"    buf \u003d collections.deque()"}],"source_content_type":"text/x-python","patch_set":1,"id":"a833b85f_1def7db5","line":2150,"updated":"2026-05-05 21:00:44.000000000","message":"Similarly here: should this actually be two labels?","commit_id":"7708020fbc2cb865de0956b2ae5be49216b786a6"},{"author":{"_account_id":15343,"name":"Tim Burke","email":"tburke@nvidia.com","username":"tburke"},"change_message_id":"331a0e2fe1611bee139219c3b942aed7e6bb1804","unresolved":false,"context_lines":[{"line_number":2147,"context_line":""},{"line_number":2148,"context_line":"    metric_name \u003d \"swift_proxy_ec_encode_inv_tput\""},{"line_number":2149,"context_line":"    metric_labels \u003d {"},{"line_number":2150,"context_line":"        \"scheme\": f\"{policy.name}:{policy.ec_type}\""},{"line_number":2151,"context_line":"    }"},{"line_number":2152,"context_line":""},{"line_number":2153,"context_line":"    buf \u003d collections.deque()"}],"source_content_type":"text/x-python","patch_set":1,"id":"d6208682_40f4a5a5","line":2150,"in_reply_to":"823a73b5_4fce013d","updated":"2026-05-06 21:35:59.000000000","message":"Acknowledged","commit_id":"7708020fbc2cb865de0956b2ae5be49216b786a6"},{"author":{"_account_id":38767,"name":"Wael Halbawi","display_name":"Wael Halbawi","email":"whalbawi@nvidia.com","username":"whalbawi"},"change_message_id":"daade1c2f41eba349b980ae10fd8097a1358753e","unresolved":true,"context_lines":[{"line_number":2147,"context_line":""},{"line_number":2148,"context_line":"    metric_name \u003d \"swift_proxy_ec_encode_inv_tput\""},{"line_number":2149,"context_line":"    metric_labels \u003d {"},{"line_number":2150,"context_line":"        \"scheme\": f\"{policy.name}:{policy.ec_type}\""},{"line_number":2151,"context_line":"    }"},{"line_number":2152,"context_line":""},{"line_number":2153,"context_line":"    buf \u003d collections.deque()"}],"source_content_type":"text/x-python","patch_set":1,"id":"823a73b5_4fce013d","line":2150,"in_reply_to":"a833b85f_1def7db5","updated":"2026-05-06 20:22:51.000000000","message":"I\u0027ll reuse the same response I added for decode portion here. Let me know if you think they should be addressed separately.","commit_id":"7708020fbc2cb865de0956b2ae5be49216b786a6"}]}
