)]}'
{"/PATCHSET_LEVEL":[{"author":{"_account_id":15519,"name":"Iury Gregory Melo Ferreira","display_name":"Iury Gregory","email":"iurygregory@gmail.com","username":"iurygregory"},"change_message_id":"16bca45c6e5ece522172f8708525f985b9285808","unresolved":false,"context_lines":[],"source_content_type":"","patch_set":5,"id":"c98f824c_abae3ceb","updated":"2026-02-02 14:57:37.000000000","message":"did a quick review in the morning, lgtm, going to check a few more details in the afternoon","commit_id":"f088672b38bc802552590649db697e61b9e89eba"}],"specs/approved/async-metrics.rst":[{"author":{"_account_id":10342,"name":"Jay Faulkner","display_name":"JayF","email":"jay@jvf.cc","username":"JayF","status":"youtube.com/@oss-gr / podcast.gr-oss.io"},"change_message_id":"dc92ecf7c3d7ecd5fdd6184489b7e199f6f3cef0","unresolved":true,"context_lines":[{"line_number":32,"context_line":"Ironic, is that the operators want to see sensor information **quickly** even"},{"line_number":33,"context_line":"in the presence of hundreds or thousands of nodes per conductor. As an example,"},{"line_number":34,"context_line":"in OpenShift\u0027s deployment of Ironic, the only conductor is expected to handle"},{"line_number":35,"context_line":"up to 3500 nodes, while operators want to collect sensor data every minute to"},{"line_number":36,"context_line":"be able to raise alerts as quickly as possible."},{"line_number":37,"context_line":""},{"line_number":38,"context_line":"The current implementation in Ironic is an obvious bottleneck here. Sensor data"}],"source_content_type":"text/x-rst","patch_set":5,"id":"8f208d4a_9c5d9c00","line":35,"updated":"2026-01-27 20:17:52.000000000","message":"I\u0027ll note that *nowhere* upstream have we ever recommended or discussed folks scaling up conductors that far. (until now?)","commit_id":"f088672b38bc802552590649db697e61b9e89eba"},{"author":{"_account_id":11655,"name":"Julia Kreger","email":"juliaashleykreger@gmail.com","username":"jkreger","status":"Flying to the moon with a Jetpack!"},"change_message_id":"f65e215513b0af65d8fc9b3255949c49a31e7e55","unresolved":true,"context_lines":[{"line_number":32,"context_line":"Ironic, is that the operators want to see sensor information **quickly** even"},{"line_number":33,"context_line":"in the presence of hundreds or thousands of nodes per conductor. As an example,"},{"line_number":34,"context_line":"in OpenShift\u0027s deployment of Ironic, the only conductor is expected to handle"},{"line_number":35,"context_line":"up to 3500 nodes, while operators want to collect sensor data every minute to"},{"line_number":36,"context_line":"be able to raise alerts as quickly as possible."},{"line_number":37,"context_line":""},{"line_number":38,"context_line":"The current implementation in Ironic is an obvious bottleneck here. Sensor data"}],"source_content_type":"text/x-rst","patch_set":5,"id":"a36c769e_648e2d3c","line":35,"in_reply_to":"0b5a8ee4_bf45b4e6","updated":"2026-03-09 14:35:19.000000000","message":"FWIW, I read this as an \"this is openshift\u0027s expectation\". That being said, I think it could be better stressed as such. That being said, we have always avoided putting firm numbers for the reasons Dmitry noted in so much as scaling guidance. \n\nThat being said as well, we did also have to make the redfish connection cache size configurable because folks were hitting the limit.... Now with a default of 1000.","commit_id":"f088672b38bc802552590649db697e61b9e89eba"},{"author":{"_account_id":10239,"name":"Dmitry Tantsur","email":"dtantsur@protonmail.com","username":"dtantsur"},"change_message_id":"e03b4952682013dfaf414629b70ff6f026d09cc4","unresolved":true,"context_lines":[{"line_number":32,"context_line":"Ironic, is that the operators want to see sensor information **quickly** even"},{"line_number":33,"context_line":"in the presence of hundreds or thousands of nodes per conductor. As an example,"},{"line_number":34,"context_line":"in OpenShift\u0027s deployment of Ironic, the only conductor is expected to handle"},{"line_number":35,"context_line":"up to 3500 nodes, while operators want to collect sensor data every minute to"},{"line_number":36,"context_line":"be able to raise alerts as quickly as possible."},{"line_number":37,"context_line":""},{"line_number":38,"context_line":"The current implementation in Ironic is an obvious bottleneck here. Sensor data"}],"source_content_type":"text/x-rst","patch_set":5,"id":"0b5a8ee4_bf45b4e6","line":35,"in_reply_to":"8f208d4a_9c5d9c00","updated":"2026-01-28 12:52:57.000000000","message":"I think we\u0027ve always avoided putting specific scale numbers in the upstream docs, simply because we cannot really test them. I\u0027m also not treating this proposal as a reason to firmly say we support 3500 nodes. But if people try, and it just works, it\u0027s going to leave a solid impression.","commit_id":"f088672b38bc802552590649db697e61b9e89eba"},{"author":{"_account_id":15519,"name":"Iury Gregory Melo Ferreira","display_name":"Iury Gregory","email":"iurygregory@gmail.com","username":"iurygregory"},"change_message_id":"6f9f75dd7dde565524acfae6566cf1d98defcdc8","unresolved":false,"context_lines":[{"line_number":32,"context_line":"Ironic, is that the operators want to see sensor information **quickly** even"},{"line_number":33,"context_line":"in the presence of hundreds or thousands of nodes per conductor. As an example,"},{"line_number":34,"context_line":"in OpenShift\u0027s deployment of Ironic, the only conductor is expected to handle"},{"line_number":35,"context_line":"up to 3500 nodes, while operators want to collect sensor data every minute to"},{"line_number":36,"context_line":"be able to raise alerts as quickly as possible."},{"line_number":37,"context_line":""},{"line_number":38,"context_line":"The current implementation in Ironic is an obvious bottleneck here. Sensor data"}],"source_content_type":"text/x-rst","patch_set":5,"id":"c7368e7d_bcf1d4a6","line":35,"in_reply_to":"a36c769e_648e2d3c","updated":"2026-05-31 23:58:52.000000000","message":"Updated to address Julia`s point regarding this being the openshift expectation, hopefully it\u0027s more clear now.","commit_id":"f088672b38bc802552590649db697e61b9e89eba"},{"author":{"_account_id":10342,"name":"Jay Faulkner","display_name":"JayF","email":"jay@jvf.cc","username":"JayF","status":"youtube.com/@oss-gr / podcast.gr-oss.io"},"change_message_id":"dc92ecf7c3d7ecd5fdd6184489b7e199f6f3cef0","unresolved":true,"context_lines":[{"line_number":39,"context_line":"is collected for all nodes sequentially in 4 worker threads. Even for pretty"},{"line_number":40,"context_line":"humble 200 nodes per conductor, each thread will handle 50 nodes one after the"},{"line_number":41,"context_line":"other. Only if each node is processed for no more than a second is it possible"},{"line_number":42,"context_line":"to hit the 1 minute deadline."},{"line_number":43,"context_line":""},{"line_number":44,"context_line":"Unfortunately, the collection via Redfish is not that fast in real life. For"},{"line_number":45,"context_line":"each node, several GET requests are needed, each taking 1-2 seconds on average"}],"source_content_type":"text/x-rst","patch_set":5,"id":"c0df52af_7850fa4b","line":42,"updated":"2026-01-27 20:17:52.000000000","message":"FWIW I think the case exists to implement this even with the more common 800-1000 nodes per conductor.","commit_id":"f088672b38bc802552590649db697e61b9e89eba"},{"author":{"_account_id":10239,"name":"Dmitry Tantsur","email":"dtantsur@protonmail.com","username":"dtantsur"},"change_message_id":"e03b4952682013dfaf414629b70ff6f026d09cc4","unresolved":true,"context_lines":[{"line_number":39,"context_line":"is collected for all nodes sequentially in 4 worker threads. Even for pretty"},{"line_number":40,"context_line":"humble 200 nodes per conductor, each thread will handle 50 nodes one after the"},{"line_number":41,"context_line":"other. Only if each node is processed for no more than a second is it possible"},{"line_number":42,"context_line":"to hit the 1 minute deadline."},{"line_number":43,"context_line":""},{"line_number":44,"context_line":"Unfortunately, the collection via Redfish is not that fast in real life. For"},{"line_number":45,"context_line":"each node, several GET requests are needed, each taking 1-2 seconds on average"}],"source_content_type":"text/x-rst","patch_set":5,"id":"d3062b2c_7928ce1b","line":42,"in_reply_to":"c0df52af_7850fa4b","updated":"2026-01-28 12:52:57.000000000","message":"Even 1000 are impossible with the current architecture IMO. You cannot process 250 nodes per threads under 1 minute, even ignoring pathological hardware.","commit_id":"f088672b38bc802552590649db697e61b9e89eba"},{"author":{"_account_id":10342,"name":"Jay Faulkner","display_name":"JayF","email":"jay@jvf.cc","username":"JayF","status":"youtube.com/@oss-gr / podcast.gr-oss.io"},"change_message_id":"c31572e5cf433582d193aad5f7db3e22464cf37a","unresolved":true,"context_lines":[{"line_number":39,"context_line":"is collected for all nodes sequentially in 4 worker threads. Even for pretty"},{"line_number":40,"context_line":"humble 200 nodes per conductor, each thread will handle 50 nodes one after the"},{"line_number":41,"context_line":"other. Only if each node is processed for no more than a second is it possible"},{"line_number":42,"context_line":"to hit the 1 minute deadline."},{"line_number":43,"context_line":""},{"line_number":44,"context_line":"Unfortunately, the collection via Redfish is not that fast in real life. For"},{"line_number":45,"context_line":"each node, several GET requests are needed, each taking 1-2 seconds on average"}],"source_content_type":"text/x-rst","patch_set":5,"id":"d5eab407_5ee9dd40","line":42,"in_reply_to":"d3062b2c_7928ce1b","updated":"2026-02-03 20:04:18.000000000","message":"Yep, we agree. I just was nothing explicitly that you don\u0027t to be hyperscaling conductors to get the benefit","commit_id":"f088672b38bc802552590649db697e61b9e89eba"},{"author":{"_account_id":15519,"name":"Iury Gregory Melo Ferreira","display_name":"Iury Gregory","email":"iurygregory@gmail.com","username":"iurygregory"},"change_message_id":"6f9f75dd7dde565524acfae6566cf1d98defcdc8","unresolved":false,"context_lines":[{"line_number":39,"context_line":"is collected for all nodes sequentially in 4 worker threads. Even for pretty"},{"line_number":40,"context_line":"humble 200 nodes per conductor, each thread will handle 50 nodes one after the"},{"line_number":41,"context_line":"other. Only if each node is processed for no more than a second is it possible"},{"line_number":42,"context_line":"to hit the 1 minute deadline."},{"line_number":43,"context_line":""},{"line_number":44,"context_line":"Unfortunately, the collection via Redfish is not that fast in real life. For"},{"line_number":45,"context_line":"each node, several GET requests are needed, each taking 1-2 seconds on average"}],"source_content_type":"text/x-rst","patch_set":5,"id":"65a2ef78_bc78ecff","line":42,"in_reply_to":"d5eab407_5ee9dd40","updated":"2026-05-31 23:58:52.000000000","message":"Marking this as resolved.","commit_id":"f088672b38bc802552590649db697e61b9e89eba"},{"author":{"_account_id":11655,"name":"Julia Kreger","email":"juliaashleykreger@gmail.com","username":"jkreger","status":"Flying to the moon with a Jetpack!"},"change_message_id":"f65e215513b0af65d8fc9b3255949c49a31e7e55","unresolved":true,"context_lines":[{"line_number":47,"context_line":"for thousands of nodes requires either cranking up the workers threads to"},{"line_number":48,"context_line":"unreasonable values or deploying significantly more conductors."},{"line_number":49,"context_line":""},{"line_number":50,"context_line":"As an aside, a similar problem exists in the power synchronization loop."},{"line_number":51,"context_line":""},{"line_number":52,"context_line":"Proposed change"},{"line_number":53,"context_line":"\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d"}],"source_content_type":"text/x-rst","patch_set":5,"id":"599d14e6_596fa808","line":50,"updated":"2026-03-09 14:35:19.000000000","message":"In part, and it is likely okay because it is built around the ipmi bmc rules of interaction where the standard sets an expectation of how often the BMC will be communicated with via ipmi. The difference here is with sensor data we want to move to a \"check as often and as fast as possible\" to have the highest resolution data reasonably available.\n\nThe question of \"Do we really need to check power with the same frequency?\" is a question we need to be careful around.","commit_id":"f088672b38bc802552590649db697e61b9e89eba"},{"author":{"_account_id":10239,"name":"Dmitry Tantsur","email":"dtantsur@protonmail.com","username":"dtantsur"},"change_message_id":"4e62afea45cbd305df50df9fe4f4696695e92065","unresolved":true,"context_lines":[{"line_number":47,"context_line":"for thousands of nodes requires either cranking up the workers threads to"},{"line_number":48,"context_line":"unreasonable values or deploying significantly more conductors."},{"line_number":49,"context_line":""},{"line_number":50,"context_line":"As an aside, a similar problem exists in the power synchronization loop."},{"line_number":51,"context_line":""},{"line_number":52,"context_line":"Proposed change"},{"line_number":53,"context_line":"\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d"}],"source_content_type":"text/x-rst","patch_set":5,"id":"631425bc_d9c0c937","line":50,"in_reply_to":"599d14e6_596fa808","updated":"2026-04-28 16:19:51.000000000","message":"Yeah, that\u0027s why it\u0027s an aside here, not a suggestion. I don\u0027t have anyone come and demand quick turnaround on power sync.","commit_id":"f088672b38bc802552590649db697e61b9e89eba"},{"author":{"_account_id":15519,"name":"Iury Gregory Melo Ferreira","display_name":"Iury Gregory","email":"iurygregory@gmail.com","username":"iurygregory"},"change_message_id":"6f9f75dd7dde565524acfae6566cf1d98defcdc8","unresolved":true,"context_lines":[{"line_number":47,"context_line":"for thousands of nodes requires either cranking up the workers threads to"},{"line_number":48,"context_line":"unreasonable values or deploying significantly more conductors."},{"line_number":49,"context_line":""},{"line_number":50,"context_line":"As an aside, a similar problem exists in the power synchronization loop."},{"line_number":51,"context_line":""},{"line_number":52,"context_line":"Proposed change"},{"line_number":53,"context_line":"\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d"}],"source_content_type":"text/x-rst","patch_set":5,"id":"a902c500_810fe9e6","line":50,"in_reply_to":"631425bc_d9c0c937","updated":"2026-05-31 23:58:52.000000000","message":"I think we can mark this as Done, but I will wait for Julia to confirm.","commit_id":"f088672b38bc802552590649db697e61b9e89eba"},{"author":{"_account_id":10342,"name":"Jay Faulkner","display_name":"JayF","email":"jay@jvf.cc","username":"JayF","status":"youtube.com/@oss-gr / podcast.gr-oss.io"},"change_message_id":"dc92ecf7c3d7ecd5fdd6184489b7e199f6f3cef0","unresolved":true,"context_lines":[{"line_number":84,"context_line":"                except (NotImplementedError, AttributeError):"},{"line_number":85,"context_line":"                    # NOTE(dtantsur): the existing code relies on creating new"},{"line_number":86,"context_line":"                    # tasks inside worker threads, thus only storing the UUID."},{"line_number":87,"context_line":"                    sync_queue.put(task.node.uuid)"},{"line_number":88,"context_line":"                else:"},{"line_number":89,"context_line":"                    async_tasks.append(fut)"},{"line_number":90,"context_line":"        else:"}],"source_content_type":"text/x-rst","patch_set":5,"id":"ea96ecef_7e51ac89","line":87,"updated":"2026-01-27 20:17:52.000000000","message":"I wonder if there\u0027s value in feature flagging vs try;except for this; something like capabilities in Nova\u0027s virt driver? https://opendev.org/openstack/nova/src/commit/134d3ac476da6abddcea8cb652a4f07460082c29/nova/virt/ironic/driver.py#L156","commit_id":"f088672b38bc802552590649db697e61b9e89eba"},{"author":{"_account_id":10342,"name":"Jay Faulkner","display_name":"JayF","email":"jay@jvf.cc","username":"JayF","status":"youtube.com/@oss-gr / podcast.gr-oss.io"},"change_message_id":"c31572e5cf433582d193aad5f7db3e22464cf37a","unresolved":true,"context_lines":[{"line_number":84,"context_line":"                except (NotImplementedError, AttributeError):"},{"line_number":85,"context_line":"                    # NOTE(dtantsur): the existing code relies on creating new"},{"line_number":86,"context_line":"                    # tasks inside worker threads, thus only storing the UUID."},{"line_number":87,"context_line":"                    sync_queue.put(task.node.uuid)"},{"line_number":88,"context_line":"                else:"},{"line_number":89,"context_line":"                    async_tasks.append(fut)"},{"line_number":90,"context_line":"        else:"}],"source_content_type":"text/x-rst","patch_set":5,"id":"eb69ffdc_0848e5f9","line":87,"in_reply_to":"d03b511d_2bab786f","updated":"2026-02-03 20:04:18.000000000","message":"It\u0027d just be nice to have it declarative so it\u0027s easier to document our feature matrix. That\u0027s what I\u0027m thinking.","commit_id":"f088672b38bc802552590649db697e61b9e89eba"},{"author":{"_account_id":10239,"name":"Dmitry Tantsur","email":"dtantsur@protonmail.com","username":"dtantsur"},"change_message_id":"e03b4952682013dfaf414629b70ff6f026d09cc4","unresolved":true,"context_lines":[{"line_number":84,"context_line":"                except (NotImplementedError, AttributeError):"},{"line_number":85,"context_line":"                    # NOTE(dtantsur): the existing code relies on creating new"},{"line_number":86,"context_line":"                    # tasks inside worker threads, thus only storing the UUID."},{"line_number":87,"context_line":"                    sync_queue.put(task.node.uuid)"},{"line_number":88,"context_line":"                else:"},{"line_number":89,"context_line":"                    async_tasks.append(fut)"},{"line_number":90,"context_line":"        else:"}],"source_content_type":"text/x-rst","patch_set":5,"id":"d03b511d_2bab786f","line":87,"in_reply_to":"ea96ecef_7e51ac89","updated":"2026-01-28 12:52:57.000000000","message":"I don\u0027t have a strong opinion here, can go either way.","commit_id":"f088672b38bc802552590649db697e61b9e89eba"},{"author":{"_account_id":15519,"name":"Iury Gregory Melo Ferreira","display_name":"Iury Gregory","email":"iurygregory@gmail.com","username":"iurygregory"},"change_message_id":"6f9f75dd7dde565524acfae6566cf1d98defcdc8","unresolved":true,"context_lines":[{"line_number":84,"context_line":"                except (NotImplementedError, AttributeError):"},{"line_number":85,"context_line":"                    # NOTE(dtantsur): the existing code relies on creating new"},{"line_number":86,"context_line":"                    # tasks inside worker threads, thus only storing the UUID."},{"line_number":87,"context_line":"                    sync_queue.put(task.node.uuid)"},{"line_number":88,"context_line":"                else:"},{"line_number":89,"context_line":"                    async_tasks.append(fut)"},{"line_number":90,"context_line":"        else:"}],"source_content_type":"text/x-rst","patch_set":5,"id":"22111694_7025d616","line":87,"in_reply_to":"eb69ffdc_0848e5f9","updated":"2026-05-31 23:58:52.000000000","message":"This is probably something we can bring up in the reviews, can we mark as resolved?","commit_id":"f088672b38bc802552590649db697e61b9e89eba"},{"author":{"_account_id":11655,"name":"Julia Kreger","email":"juliaashleykreger@gmail.com","username":"jkreger","status":"Flying to the moon with a Jetpack!"},"change_message_id":"f65e215513b0af65d8fc9b3255949c49a31e7e55","unresolved":true,"context_lines":[{"line_number":125,"context_line":""},{"line_number":126,"context_line":"In my experiments, running 3500 tasks did not put any larger strain on the CPU"},{"line_number":127,"context_line":"and memory than batches of 500. As a result, I\u0027d prefer to start with a simple"},{"line_number":128,"context_line":"approach and modify it based on the real life feedback."},{"line_number":129,"context_line":""},{"line_number":130,"context_line":"Redfish implementation"},{"line_number":131,"context_line":"----------------------"}],"source_content_type":"text/x-rst","patch_set":5,"id":"6eb7bc0a_ff7299c6","line":128,"range":{"start_line":128,"start_character":36,"end_line":128,"end_character":45},"updated":"2026-03-09 14:35:19.000000000","message":"nit: s/real life/operator/ 😊","commit_id":"f088672b38bc802552590649db697e61b9e89eba"},{"author":{"_account_id":15519,"name":"Iury Gregory Melo Ferreira","display_name":"Iury Gregory","email":"iurygregory@gmail.com","username":"iurygregory"},"change_message_id":"6f9f75dd7dde565524acfae6566cf1d98defcdc8","unresolved":false,"context_lines":[{"line_number":125,"context_line":""},{"line_number":126,"context_line":"In my experiments, running 3500 tasks did not put any larger strain on the CPU"},{"line_number":127,"context_line":"and memory than batches of 500. As a result, I\u0027d prefer to start with a simple"},{"line_number":128,"context_line":"approach and modify it based on the real life feedback."},{"line_number":129,"context_line":""},{"line_number":130,"context_line":"Redfish implementation"},{"line_number":131,"context_line":"----------------------"}],"source_content_type":"text/x-rst","patch_set":5,"id":"ca51020c_ca89a1a7","line":128,"range":{"start_line":128,"start_character":36,"end_line":128,"end_character":45},"in_reply_to":"6eb7bc0a_ff7299c6","updated":"2026-05-31 23:58:52.000000000","message":"Done","commit_id":"f088672b38bc802552590649db697e61b9e89eba"},{"author":{"_account_id":11655,"name":"Julia Kreger","email":"juliaashleykreger@gmail.com","username":"jkreger","status":"Flying to the moon with a Jetpack!"},"change_message_id":"f65e215513b0af65d8fc9b3255949c49a31e7e55","unresolved":true,"context_lines":[{"line_number":231,"context_line":"    if session_or_lock is None:"},{"line_number":232,"context_line":"        # No session and no thread is working on it, create a lock"},{"line_number":233,"context_line":"        session_or_lock \u003d asyncio.Lock()"},{"line_number":234,"context_line":"        self._sessions[session_key] \u003d session_or_lock"},{"line_number":235,"context_line":"        # Now a lock is cached and serves as a mark that the current thread"},{"line_number":236,"context_line":"        # will be working on the session. Since no yield points have happened"},{"line_number":237,"context_line":"        # so far, this thread is the first to take the lock. Even if it isn\u0027t,"}],"source_content_type":"text/x-rst","patch_set":5,"id":"ec516f86_66b30f8e","line":234,"updated":"2026-03-09 14:35:19.000000000","message":"So, Something about this makes me nervous. Because we\u0027re effectively putting a lock option on top of the session to make the determination mid-stream when it feels like we should treat it the async config option non-hup-able option and try to populate the cache structure to enable it to launch anyway. If the code, as noted above is *two* entirely different caches, then cool, but the code here doesn\u0027t suggest that, it suggests single integrated cache.","commit_id":"f088672b38bc802552590649db697e61b9e89eba"},{"author":{"_account_id":15519,"name":"Iury Gregory Melo Ferreira","display_name":"Iury Gregory","email":"iurygregory@gmail.com","username":"iurygregory"},"change_message_id":"6f9f75dd7dde565524acfae6566cf1d98defcdc8","unresolved":true,"context_lines":[{"line_number":231,"context_line":"    if session_or_lock is None:"},{"line_number":232,"context_line":"        # No session and no thread is working on it, create a lock"},{"line_number":233,"context_line":"        session_or_lock \u003d asyncio.Lock()"},{"line_number":234,"context_line":"        self._sessions[session_key] \u003d session_or_lock"},{"line_number":235,"context_line":"        # Now a lock is cached and serves as a mark that the current thread"},{"line_number":236,"context_line":"        # will be working on the session. Since no yield points have happened"},{"line_number":237,"context_line":"        # so far, this thread is the first to take the lock. Even if it isn\u0027t,"}],"source_content_type":"text/x-rst","patch_set":5,"id":"aee999a9_041398bc","line":234,"in_reply_to":"b3b9948c_e7fa2ff5","updated":"2026-05-31 23:58:52.000000000","message":"I\u0027ve updated the text and the code a bit with Claude to see if help with Julia\u0027s sugenstion.","commit_id":"f088672b38bc802552590649db697e61b9e89eba"},{"author":{"_account_id":10239,"name":"Dmitry Tantsur","email":"dtantsur@protonmail.com","username":"dtantsur"},"change_message_id":"4e62afea45cbd305df50df9fe4f4696695e92065","unresolved":true,"context_lines":[{"line_number":231,"context_line":"    if session_or_lock is None:"},{"line_number":232,"context_line":"        # No session and no thread is working on it, create a lock"},{"line_number":233,"context_line":"        session_or_lock \u003d asyncio.Lock()"},{"line_number":234,"context_line":"        self._sessions[session_key] \u003d session_or_lock"},{"line_number":235,"context_line":"        # Now a lock is cached and serves as a mark that the current thread"},{"line_number":236,"context_line":"        # will be working on the session. Since no yield points have happened"},{"line_number":237,"context_line":"        # so far, this thread is the first to take the lock. Even if it isn\u0027t,"}],"source_content_type":"text/x-rst","patch_set":5,"id":"b3b9948c_e7fa2ff5","line":234,"in_reply_to":"ec516f86_66b30f8e","updated":"2026-04-28 16:19:51.000000000","message":"Does it? I assumed line 216 together with using asyncio.Lock would establish the context but I\u0027m open to ideas on how to clarify it.\n\nI can also punt on the exact implementation and leave it up to the code itself. I just wanted to get the idea out of my head.","commit_id":"f088672b38bc802552590649db697e61b9e89eba"},{"author":{"_account_id":23851,"name":"Riccardo Pittau","email":"elfosardo@gmail.com","username":"elfosardo"},"change_message_id":"02e195ca79afd0b7f492e468c6ed0cf186eddf0f","unresolved":true,"context_lines":[{"line_number":242,"context_line":"        new_session \u003d self._sessions[session_key]"},{"line_number":243,"context_line":"        if isinstance(new_session, asyncio.Lock):"},{"line_number":244,"context_line":"            # This thread is the first to take the lock, create the session."},{"line_number":245,"context_line":"            new_session \u003d await AsyncConnection(...)"},{"line_number":246,"context_line":"            # After this point, other threads will see the session and will be"},{"line_number":247,"context_line":"            # able to use it."},{"line_number":248,"context_line":"            self._sessions[session_key] \u003d new_session"}],"source_content_type":"text/x-rst","patch_set":5,"id":"ba518024_e97da734","line":245,"updated":"2026-02-03 13:23:43.000000000","message":"It\u0027s probably more of an implementation detail, but I\u0027m a bit concerned in case an exception is raised here if we risk to lock out any other tentative of establishing a session with that node. We should think about removing the lock in case of failed connection.","commit_id":"f088672b38bc802552590649db697e61b9e89eba"},{"author":{"_account_id":15519,"name":"Iury Gregory Melo Ferreira","display_name":"Iury Gregory","email":"iurygregory@gmail.com","username":"iurygregory"},"change_message_id":"6f9f75dd7dde565524acfae6566cf1d98defcdc8","unresolved":false,"context_lines":[{"line_number":242,"context_line":"        new_session \u003d self._sessions[session_key]"},{"line_number":243,"context_line":"        if isinstance(new_session, asyncio.Lock):"},{"line_number":244,"context_line":"            # This thread is the first to take the lock, create the session."},{"line_number":245,"context_line":"            new_session \u003d await AsyncConnection(...)"},{"line_number":246,"context_line":"            # After this point, other threads will see the session and will be"},{"line_number":247,"context_line":"            # able to use it."},{"line_number":248,"context_line":"            self._sessions[session_key] \u003d new_session"}],"source_content_type":"text/x-rst","patch_set":5,"id":"d1051638_6e8a40c7","line":245,"in_reply_to":"65657a8b_4edb3b59","updated":"2026-05-31 23:58:52.000000000","message":"Since Riccardo marked +2 I`m assuming he is ok with the answer, marking as resolved","commit_id":"f088672b38bc802552590649db697e61b9e89eba"},{"author":{"_account_id":10239,"name":"Dmitry Tantsur","email":"dtantsur@protonmail.com","username":"dtantsur"},"change_message_id":"4e62afea45cbd305df50df9fe4f4696695e92065","unresolved":true,"context_lines":[{"line_number":242,"context_line":"        new_session \u003d self._sessions[session_key]"},{"line_number":243,"context_line":"        if isinstance(new_session, asyncio.Lock):"},{"line_number":244,"context_line":"            # This thread is the first to take the lock, create the session."},{"line_number":245,"context_line":"            new_session \u003d await AsyncConnection(...)"},{"line_number":246,"context_line":"            # After this point, other threads will see the session and will be"},{"line_number":247,"context_line":"            # able to use it."},{"line_number":248,"context_line":"            self._sessions[session_key] \u003d new_session"}],"source_content_type":"text/x-rst","patch_set":5,"id":"65657a8b_4edb3b59","line":245,"in_reply_to":"ba518024_e97da734","updated":"2026-04-28 16:19:51.000000000","message":"If this call fails, session_or_lock will be unlocked by `async with`, it won\u0027t be overridden in the cache, and the lock will remain in place for the next thread to take. Does it address your concern?","commit_id":"f088672b38bc802552590649db697e61b9e89eba"},{"author":{"_account_id":11655,"name":"Julia Kreger","email":"juliaashleykreger@gmail.com","username":"jkreger","status":"Flying to the moon with a Jetpack!"},"change_message_id":"f65e215513b0af65d8fc9b3255949c49a31e7e55","unresolved":true,"context_lines":[{"line_number":274,"context_line":"  implementation is simply not designed to launch a large number of threads"},{"line_number":275,"context_line":"  quickly. In my experiments, even after removing any limits on the number of"},{"line_number":276,"context_line":"  threads, it still oscillated between 200 and 500, sometimes even dropping to"},{"line_number":277,"context_line":"  two-digit numbers, all while the sensor collection seemed to never finish."},{"line_number":278,"context_line":""},{"line_number":279,"context_line":"* Do not support more than roughly 500 nodes per conductor, at least in this"},{"line_number":280,"context_line":"  scenario. This will be a major setback for adoption of Ironic in the Metal3"}],"source_content_type":"text/x-rst","patch_set":5,"id":"3348c8f5_404e2aa2","line":277,"range":{"start_line":277,"start_character":21,"end_line":277,"end_character":76},"updated":"2026-03-09 14:35:19.000000000","message":"open file limits becoming blocking? 😊","commit_id":"f088672b38bc802552590649db697e61b9e89eba"},{"author":{"_account_id":10239,"name":"Dmitry Tantsur","email":"dtantsur@protonmail.com","username":"dtantsur"},"change_message_id":"4e62afea45cbd305df50df9fe4f4696695e92065","unresolved":true,"context_lines":[{"line_number":274,"context_line":"  implementation is simply not designed to launch a large number of threads"},{"line_number":275,"context_line":"  quickly. In my experiments, even after removing any limits on the number of"},{"line_number":276,"context_line":"  threads, it still oscillated between 200 and 500, sometimes even dropping to"},{"line_number":277,"context_line":"  two-digit numbers, all while the sensor collection seemed to never finish."},{"line_number":278,"context_line":""},{"line_number":279,"context_line":"* Do not support more than roughly 500 nodes per conductor, at least in this"},{"line_number":280,"context_line":"  scenario. This will be a major setback for adoption of Ironic in the Metal3"}],"source_content_type":"text/x-rst","patch_set":5,"id":"50e752ac_b0717b0f","line":277,"range":{"start_line":277,"start_character":21,"end_line":277,"end_character":76},"in_reply_to":"3348c8f5_404e2aa2","updated":"2026-04-28 16:19:51.000000000","message":"Yep, it definitely contributed to the problem","commit_id":"f088672b38bc802552590649db697e61b9e89eba"},{"author":{"_account_id":15519,"name":"Iury Gregory Melo Ferreira","display_name":"Iury Gregory","email":"iurygregory@gmail.com","username":"iurygregory"},"change_message_id":"6f9f75dd7dde565524acfae6566cf1d98defcdc8","unresolved":false,"context_lines":[{"line_number":274,"context_line":"  implementation is simply not designed to launch a large number of threads"},{"line_number":275,"context_line":"  quickly. In my experiments, even after removing any limits on the number of"},{"line_number":276,"context_line":"  threads, it still oscillated between 200 and 500, sometimes even dropping to"},{"line_number":277,"context_line":"  two-digit numbers, all while the sensor collection seemed to never finish."},{"line_number":278,"context_line":""},{"line_number":279,"context_line":"* Do not support more than roughly 500 nodes per conductor, at least in this"},{"line_number":280,"context_line":"  scenario. This will be a major setback for adoption of Ironic in the Metal3"}],"source_content_type":"text/x-rst","patch_set":5,"id":"c01f5018_1f80d114","line":277,"range":{"start_line":277,"start_character":21,"end_line":277,"end_character":76},"in_reply_to":"50e752ac_b0717b0f","updated":"2026-05-31 23:58:52.000000000","message":"Marking this as resolved","commit_id":"f088672b38bc802552590649db697e61b9e89eba"},{"author":{"_account_id":10342,"name":"Jay Faulkner","display_name":"JayF","email":"jay@jvf.cc","username":"JayF","status":"youtube.com/@oss-gr / podcast.gr-oss.io"},"change_message_id":"dc92ecf7c3d7ecd5fdd6184489b7e199f6f3cef0","unresolved":true,"context_lines":[{"line_number":281,"context_line":"  world, where multi-conductor deployments are not common and where thousands"},{"line_number":282,"context_line":"  of nodes (like 3500 in the OpenShift case) are already routinely handled."},{"line_number":283,"context_line":"  Outside of large classic OpenStack clusters, we cannot expect operators to"},{"line_number":284,"context_line":"  deploy as many replicas of Ironic as we\u0027d prefer."},{"line_number":285,"context_line":""},{"line_number":286,"context_line":"Possible variations"},{"line_number":287,"context_line":"~~~~~~~~~~~~~~~~~~~"}],"source_content_type":"text/x-rst","patch_set":5,"id":"ff780636_56fb9713","line":284,"updated":"2026-01-27 20:17:52.000000000","message":"This seems like a limitation that\u0027ll be worse to deal with over time :|","commit_id":"f088672b38bc802552590649db697e61b9e89eba"},{"author":{"_account_id":15519,"name":"Iury Gregory Melo Ferreira","display_name":"Iury Gregory","email":"iurygregory@gmail.com","username":"iurygregory"},"change_message_id":"6f9f75dd7dde565524acfae6566cf1d98defcdc8","unresolved":true,"context_lines":[{"line_number":281,"context_line":"  world, where multi-conductor deployments are not common and where thousands"},{"line_number":282,"context_line":"  of nodes (like 3500 in the OpenShift case) are already routinely handled."},{"line_number":283,"context_line":"  Outside of large classic OpenStack clusters, we cannot expect operators to"},{"line_number":284,"context_line":"  deploy as many replicas of Ironic as we\u0027d prefer."},{"line_number":285,"context_line":""},{"line_number":286,"context_line":"Possible variations"},{"line_number":287,"context_line":"~~~~~~~~~~~~~~~~~~~"}],"source_content_type":"text/x-rst","patch_set":5,"id":"6c5fd4bf_11241ea8","line":284,"in_reply_to":"5baa097b_96272040","updated":"2026-05-31 23:58:52.000000000","message":"@jay@jvf.cc, if you want more details here please let us know, will not mark as resolved for now.","commit_id":"f088672b38bc802552590649db697e61b9e89eba"},{"author":{"_account_id":10239,"name":"Dmitry Tantsur","email":"dtantsur@protonmail.com","username":"dtantsur"},"change_message_id":"e03b4952682013dfaf414629b70ff6f026d09cc4","unresolved":true,"context_lines":[{"line_number":281,"context_line":"  world, where multi-conductor deployments are not common and where thousands"},{"line_number":282,"context_line":"  of nodes (like 3500 in the OpenShift case) are already routinely handled."},{"line_number":283,"context_line":"  Outside of large classic OpenStack clusters, we cannot expect operators to"},{"line_number":284,"context_line":"  deploy as many replicas of Ironic as we\u0027d prefer."},{"line_number":285,"context_line":""},{"line_number":286,"context_line":"Possible variations"},{"line_number":287,"context_line":"~~~~~~~~~~~~~~~~~~~"}],"source_content_type":"text/x-rst","patch_set":5,"id":"5baa097b_96272040","line":284,"in_reply_to":"ff780636_56fb9713","updated":"2026-01-28 12:52:57.000000000","message":"I know, but deploying and managing MariaDB, especially in an HA configuration, is a very high price to pay. In classic OpenStack, it\u0027s assumed to be just there, but it\u0027s a luxury that other use cases (like Metal3 for me) may not have.","commit_id":"f088672b38bc802552590649db697e61b9e89eba"},{"author":{"_account_id":11655,"name":"Julia Kreger","email":"juliaashleykreger@gmail.com","username":"jkreger","status":"Flying to the moon with a Jetpack!"},"change_message_id":"f65e215513b0af65d8fc9b3255949c49a31e7e55","unresolved":true,"context_lines":[{"line_number":297,"context_line":"  As seen in the prototype_, handling asynchronous workers can quickly lead to"},{"line_number":298,"context_line":"  convoluted code."},{"line_number":299,"context_line":""},{"line_number":300,"context_line":"  * We *may* need to limit the number of requests that hit the same"},{"line_number":301,"context_line":"    ``redfish_address`` to avoid overloading BMCs that handle multiple nodes:"},{"line_number":302,"context_line":"    see `session cache`_ for reasoning."},{"line_number":303,"context_line":""},{"line_number":304,"context_line":"* Modify the existing session cache to handle both synchronous and asynchronous"},{"line_number":305,"context_line":"  callers. This will be a desired future addition if we decide to expand the"}],"source_content_type":"text/x-rst","patch_set":5,"id":"28824d7e_c2872454","line":302,"range":{"start_line":300,"start_character":2,"end_line":302,"end_character":39},"updated":"2026-03-09 14:35:19.000000000","message":"We\u0027ve had a couple cases over the last couple of years where folks have pointed Ironic at DCIM systems front ending BMCs as a proxy, so I would concur we may need to do this. The case where we likely *must* do this is blade chassis based systems with multiple unique systems inside the case to prevent overloading the i2c bus. Alternatively if there was a way to say \"hey, don\u0027t do this for duplicate hosts\" and for the cache to be the flag indicator if the node could be used with async mode at all.\n\nFor other reader context, the issue where a single BMC is handling multiple blades, is there is likely a single I2C bus the BMC is communicating over and it is having to switch between it\u0027s direct chassis components (fans), and then if anything is specific to a single system then switch the bus into that system to collect data effectively blocking access to the other \"systems\" in the chassis because the i2c bus has only one request travel over it at a time due to PLDM. That may get better in the future as PLDM gets implemented over the next generation USB buses for communication between major components in the chassis, but blade computing chassis are also quite specific in their modeling as well.","commit_id":"f088672b38bc802552590649db697e61b9e89eba"},{"author":{"_account_id":10239,"name":"Dmitry Tantsur","email":"dtantsur@protonmail.com","username":"dtantsur"},"change_message_id":"4e62afea45cbd305df50df9fe4f4696695e92065","unresolved":true,"context_lines":[{"line_number":297,"context_line":"  As seen in the prototype_, handling asynchronous workers can quickly lead to"},{"line_number":298,"context_line":"  convoluted code."},{"line_number":299,"context_line":""},{"line_number":300,"context_line":"  * We *may* need to limit the number of requests that hit the same"},{"line_number":301,"context_line":"    ``redfish_address`` to avoid overloading BMCs that handle multiple nodes:"},{"line_number":302,"context_line":"    see `session cache`_ for reasoning."},{"line_number":303,"context_line":""},{"line_number":304,"context_line":"* Modify the existing session cache to handle both synchronous and asynchronous"},{"line_number":305,"context_line":"  callers. This will be a desired future addition if we decide to expand the"}],"source_content_type":"text/x-rst","patch_set":5,"id":"d170af3e_4a659d15","line":302,"range":{"start_line":300,"start_character":2,"end_line":302,"end_character":39},"in_reply_to":"28824d7e_c2872454","updated":"2026-04-28 16:19:51.000000000","message":"\u003e The case where we likely must do this is blade chassis based systems with multiple unique systems inside the case to prevent overloading the i2c bus.\n\nThis does apply to half of Ironic. None of our periodics actually account for this case.","commit_id":"f088672b38bc802552590649db697e61b9e89eba"},{"author":{"_account_id":15519,"name":"Iury Gregory Melo Ferreira","display_name":"Iury Gregory","email":"iurygregory@gmail.com","username":"iurygregory"},"change_message_id":"6f9f75dd7dde565524acfae6566cf1d98defcdc8","unresolved":true,"context_lines":[{"line_number":297,"context_line":"  As seen in the prototype_, handling asynchronous workers can quickly lead to"},{"line_number":298,"context_line":"  convoluted code."},{"line_number":299,"context_line":""},{"line_number":300,"context_line":"  * We *may* need to limit the number of requests that hit the same"},{"line_number":301,"context_line":"    ``redfish_address`` to avoid overloading BMCs that handle multiple nodes:"},{"line_number":302,"context_line":"    see `session cache`_ for reasoning."},{"line_number":303,"context_line":""},{"line_number":304,"context_line":"* Modify the existing session cache to handle both synchronous and asynchronous"},{"line_number":305,"context_line":"  callers. This will be a desired future addition if we decide to expand the"}],"source_content_type":"text/x-rst","patch_set":5,"id":"d1391c99_321ded63","line":302,"range":{"start_line":300,"start_character":2,"end_line":302,"end_character":39},"in_reply_to":"d170af3e_4a659d15","updated":"2026-05-31 23:58:52.000000000","message":"@juliaashleykreger@gmail.com if I\u0027m reading this correctly, we can address this during implementation or do you want some example in the spec?","commit_id":"f088672b38bc802552590649db697e61b9e89eba"},{"author":{"_account_id":11655,"name":"Julia Kreger","email":"juliaashleykreger@gmail.com","username":"jkreger","status":"Flying to the moon with a Jetpack!"},"change_message_id":"f65e215513b0af65d8fc9b3255949c49a31e7e55","unresolved":true,"context_lines":[{"line_number":303,"context_line":""},{"line_number":304,"context_line":"* Modify the existing session cache to handle both synchronous and asynchronous"},{"line_number":305,"context_line":"  callers. This will be a desired future addition if we decide to expand the"},{"line_number":306,"context_line":"  asynchronous code. It\u0027s too much of an effort for a single opt-in feature."},{"line_number":307,"context_line":""},{"line_number":308,"context_line":"Data model impact"},{"line_number":309,"context_line":"-----------------"}],"source_content_type":"text/x-rst","patch_set":5,"id":"f1507dcc_adb0506b","line":306,"updated":"2026-03-09 14:35:19.000000000","message":"Dunno, for some reason this feels \"easier\" to me or to have separate caches and have the entries managed through a single locked interface so if a password rotation occurs we might not immediately slam into using it but we would functionally block until the session management code could obtain the lock to update the entry. I guess, at the end of the day, it is more a \"where do we want to push the complexity to\" problem.","commit_id":"f088672b38bc802552590649db697e61b9e89eba"},{"author":{"_account_id":15519,"name":"Iury Gregory Melo Ferreira","display_name":"Iury Gregory","email":"iurygregory@gmail.com","username":"iurygregory"},"change_message_id":"6f9f75dd7dde565524acfae6566cf1d98defcdc8","unresolved":true,"context_lines":[{"line_number":303,"context_line":""},{"line_number":304,"context_line":"* Modify the existing session cache to handle both synchronous and asynchronous"},{"line_number":305,"context_line":"  callers. This will be a desired future addition if we decide to expand the"},{"line_number":306,"context_line":"  asynchronous code. It\u0027s too much of an effort for a single opt-in feature."},{"line_number":307,"context_line":""},{"line_number":308,"context_line":"Data model impact"},{"line_number":309,"context_line":"-----------------"}],"source_content_type":"text/x-rst","patch_set":5,"id":"51aab8f3_cbf3341e","line":306,"in_reply_to":"c25942f5_5ad47ec3","updated":"2026-05-31 23:58:52.000000000","message":"@juliaashleykreger@gmail.com let me know if we can mark this as resolved, I think it would be more an implementation detail...","commit_id":"f088672b38bc802552590649db697e61b9e89eba"},{"author":{"_account_id":10239,"name":"Dmitry Tantsur","email":"dtantsur@protonmail.com","username":"dtantsur"},"change_message_id":"4e62afea45cbd305df50df9fe4f4696695e92065","unresolved":true,"context_lines":[{"line_number":303,"context_line":""},{"line_number":304,"context_line":"* Modify the existing session cache to handle both synchronous and asynchronous"},{"line_number":305,"context_line":"  callers. This will be a desired future addition if we decide to expand the"},{"line_number":306,"context_line":"  asynchronous code. It\u0027s too much of an effort for a single opt-in feature."},{"line_number":307,"context_line":""},{"line_number":308,"context_line":"Data model impact"},{"line_number":309,"context_line":"-----------------"}],"source_content_type":"text/x-rst","patch_set":5,"id":"c25942f5_5ad47ec3","line":306,"in_reply_to":"f1507dcc_adb0506b","updated":"2026-04-28 16:19:51.000000000","message":"I think the issue is a definition of \"locked\" that fits both async and sync code.","commit_id":"f088672b38bc802552590649db697e61b9e89eba"},{"author":{"_account_id":10342,"name":"Jay Faulkner","display_name":"JayF","email":"jay@jvf.cc","username":"JayF","status":"youtube.com/@oss-gr / podcast.gr-oss.io"},"change_message_id":"dc92ecf7c3d7ecd5fdd6184489b7e199f6f3cef0","unresolved":true,"context_lines":[{"line_number":362,"context_line":"   The base implementation is not marked as ``async``. This is on purpose,"},{"line_number":363,"context_line":"   otherwise the exception won\u0027t be raised until the executor runs the task."},{"line_number":364,"context_line":"   Real implementation may be marked as ``async`` or return an awaitable"},{"line_number":365,"context_line":"   in any other way."},{"line_number":366,"context_line":""},{"line_number":367,"context_line":"Nova driver impact"},{"line_number":368,"context_line":"------------------"}],"source_content_type":"text/x-rst","patch_set":5,"id":"1ba0ecdc_d88d1725","line":365,"updated":"2026-01-27 20:17:52.000000000","message":"note that a feature-flag based model as suggested above gets rid of this particular decoder ring :D","commit_id":"f088672b38bc802552590649db697e61b9e89eba"},{"author":{"_account_id":10239,"name":"Dmitry Tantsur","email":"dtantsur@protonmail.com","username":"dtantsur"},"change_message_id":"e03b4952682013dfaf414629b70ff6f026d09cc4","unresolved":true,"context_lines":[{"line_number":362,"context_line":"   The base implementation is not marked as ``async``. This is on purpose,"},{"line_number":363,"context_line":"   otherwise the exception won\u0027t be raised until the executor runs the task."},{"line_number":364,"context_line":"   Real implementation may be marked as ``async`` or return an awaitable"},{"line_number":365,"context_line":"   in any other way."},{"line_number":366,"context_line":""},{"line_number":367,"context_line":"Nova driver impact"},{"line_number":368,"context_line":"------------------"}],"source_content_type":"text/x-rst","patch_set":5,"id":"46e5d3d5_f6b218ea","line":365,"in_reply_to":"1ba0ecdc_d88d1725","updated":"2026-01-28 12:52:57.000000000","message":"Possibly, although then it\u0027s not clear to me what the base class is supposed to return or raise.","commit_id":"f088672b38bc802552590649db697e61b9e89eba"},{"author":{"_account_id":15519,"name":"Iury Gregory Melo Ferreira","display_name":"Iury Gregory","email":"iurygregory@gmail.com","username":"iurygregory"},"change_message_id":"6f9f75dd7dde565524acfae6566cf1d98defcdc8","unresolved":false,"context_lines":[{"line_number":362,"context_line":"   The base implementation is not marked as ``async``. This is on purpose,"},{"line_number":363,"context_line":"   otherwise the exception won\u0027t be raised until the executor runs the task."},{"line_number":364,"context_line":"   Real implementation may be marked as ``async`` or return an awaitable"},{"line_number":365,"context_line":"   in any other way."},{"line_number":366,"context_line":""},{"line_number":367,"context_line":"Nova driver impact"},{"line_number":368,"context_line":"------------------"}],"source_content_type":"text/x-rst","patch_set":5,"id":"b748608e_e7f69682","line":365,"in_reply_to":"46e5d3d5_f6b218ea","updated":"2026-05-31 23:58:52.000000000","message":"Marking this as resolved since we can discuss this in the patches after the spec is approved.","commit_id":"f088672b38bc802552590649db697e61b9e89eba"},{"author":{"_account_id":10342,"name":"Jay Faulkner","display_name":"JayF","email":"jay@jvf.cc","username":"JayF","status":"youtube.com/@oss-gr / podcast.gr-oss.io"},"change_message_id":"dc92ecf7c3d7ecd5fdd6184489b7e199f6f3cef0","unresolved":true,"context_lines":[{"line_number":406,"context_line":"of RAM depending on the batch size (interestingly, collecting from all nodes at"},{"line_number":407,"context_line":"once was not where the RAM consumptions peaked, it was rather around batches of"},{"line_number":408,"context_line":"1000). The operators will need to be prepared (and we\u0027ll need to document it"},{"line_number":409,"context_line":"thoroughly) that high parallelism does not come for free."},{"line_number":410,"context_line":""},{"line_number":411,"context_line":"Other deployer impact"},{"line_number":412,"context_line":"---------------------"}],"source_content_type":"text/x-rst","patch_set":5,"id":"e010d115_e428410d","line":409,"updated":"2026-01-27 20:17:52.000000000","message":"It\u0027d be interesting to see if we can find a way, with limiting concurrency, to be able to default it to on. It seems a shame to add this level of performance boost and have to have operators opt-in to it.","commit_id":"f088672b38bc802552590649db697e61b9e89eba"},{"author":{"_account_id":15519,"name":"Iury Gregory Melo Ferreira","display_name":"Iury Gregory","email":"iurygregory@gmail.com","username":"iurygregory"},"change_message_id":"6f9f75dd7dde565524acfae6566cf1d98defcdc8","unresolved":true,"context_lines":[{"line_number":406,"context_line":"of RAM depending on the batch size (interestingly, collecting from all nodes at"},{"line_number":407,"context_line":"once was not where the RAM consumptions peaked, it was rather around batches of"},{"line_number":408,"context_line":"1000). The operators will need to be prepared (and we\u0027ll need to document it"},{"line_number":409,"context_line":"thoroughly) that high parallelism does not come for free."},{"line_number":410,"context_line":""},{"line_number":411,"context_line":"Other deployer impact"},{"line_number":412,"context_line":"---------------------"}],"source_content_type":"text/x-rst","patch_set":5,"id":"bc734613_c19ca375","line":409,"in_reply_to":"4c094972_9df8aa2a","updated":"2026-05-31 23:58:52.000000000","message":"After reading this I think we can mark this as resolved, let me know if it\u0027s ok @juliaashleykreger@gmail.com @jay@jvf.cc","commit_id":"f088672b38bc802552590649db697e61b9e89eba"},{"author":{"_account_id":10239,"name":"Dmitry Tantsur","email":"dtantsur@protonmail.com","username":"dtantsur"},"change_message_id":"e03b4952682013dfaf414629b70ff6f026d09cc4","unresolved":true,"context_lines":[{"line_number":406,"context_line":"of RAM depending on the batch size (interestingly, collecting from all nodes at"},{"line_number":407,"context_line":"once was not where the RAM consumptions peaked, it was rather around batches of"},{"line_number":408,"context_line":"1000). The operators will need to be prepared (and we\u0027ll need to document it"},{"line_number":409,"context_line":"thoroughly) that high parallelism does not come for free."},{"line_number":410,"context_line":""},{"line_number":411,"context_line":"Other deployer impact"},{"line_number":412,"context_line":"---------------------"}],"source_content_type":"text/x-rst","patch_set":5,"id":"f06286d1_9e42e0ad","line":409,"in_reply_to":"e010d115_e428410d","updated":"2026-01-28 12:52:57.000000000","message":"Unfortunately, there is only so much I can research without merging the code and giving to various people to try. I see the opt-in aspect as something like a feature gate in Kubernetes: a way to test a desired but complex feature in a way that allows easy rollbacks.","commit_id":"f088672b38bc802552590649db697e61b9e89eba"},{"author":{"_account_id":11655,"name":"Julia Kreger","email":"juliaashleykreger@gmail.com","username":"jkreger","status":"Flying to the moon with a Jetpack!"},"change_message_id":"f65e215513b0af65d8fc9b3255949c49a31e7e55","unresolved":true,"context_lines":[{"line_number":406,"context_line":"of RAM depending on the batch size (interestingly, collecting from all nodes at"},{"line_number":407,"context_line":"once was not where the RAM consumptions peaked, it was rather around batches of"},{"line_number":408,"context_line":"1000). The operators will need to be prepared (and we\u0027ll need to document it"},{"line_number":409,"context_line":"thoroughly) that high parallelism does not come for free."},{"line_number":410,"context_line":""},{"line_number":411,"context_line":"Other deployer impact"},{"line_number":412,"context_line":"---------------------"}],"source_content_type":"text/x-rst","patch_set":5,"id":"4c094972_9df8aa2a","line":409,"in_reply_to":"f06286d1_9e42e0ad","updated":"2026-03-09 14:35:19.000000000","message":"I\u0027ll note, even if we document it, operators are likely to try and tune it. We\u0027ve seen that with some of the concurrency/worker/task knobs in the past where operators immediately jump to \"more!\" even when the help text basically says \"hey, if you tune this, x,y,z\". We as a project need to be pragmatic and when we see something in the resulting operating behavior which is problematic, like... things breaking down or taking way too long in a code path like this, we need to verbosely log warnings. At least operators will see those and go \"oh, maybe I need to rethink that\".","commit_id":"f088672b38bc802552590649db697e61b9e89eba"}]}
