)]}'
{"/PATCHSET_LEVEL":[{"author":{"_account_id":11655,"name":"Julia Kreger","email":"juliaashleykreger@gmail.com","username":"jkreger","status":"Flying to the moon with a Jetpack!"},"change_message_id":"7e739d4be2b8b043352804d6d0a3145193919fa9","unresolved":false,"context_lines":[],"source_content_type":"","patch_set":2,"id":"53e657cc_0702c903","updated":"2023-03-17 18:39:35.000000000","message":"I hate to say this, but I\u0027m suspecting this might be a little easier if we split off the PXE case of other conductors doing the setup into a separate spec or an enhancement after the substrate is present. It *is* relatively simple to do with the substrate in place though.","commit_id":"e34f931d4b170d63c50508d5f7fc31fcfaa5ca7f"},{"author":{"_account_id":10342,"name":"Jay Faulkner","display_name":"JayF","email":"jay@jvf.cc","username":"JayF","status":"youtube.com/@oss-gr / podcast.gr-oss.io"},"change_message_id":"ce9cbeda326ad8e9993065e4c2c126c6cf23223e","unresolved":false,"context_lines":[],"source_content_type":"","patch_set":5,"id":"9dcaf916_1523586d","updated":"2023-07-14 08:44:48.000000000","message":"Good start, thanks for tackling this.","commit_id":"20ee510152f176395df49432a46efd1a2af156ec"},{"author":{"_account_id":11655,"name":"Julia Kreger","email":"juliaashleykreger@gmail.com","username":"jkreger","status":"Flying to the moon with a Jetpack!"},"change_message_id":"58e8ee0dec8a6f6f7bc41a81034c091d147b1e3e","unresolved":false,"context_lines":[],"source_content_type":"","patch_set":5,"id":"746456ab_821c8e94","updated":"2023-09-28 17:19:35.000000000","message":"I\u0027m likely going to abandon this spec. I think we were thinking too large, and maybe we don\u0027t *really* need to head down this path *now*.","commit_id":"20ee510152f176395df49432a46efd1a2af156ec"},{"author":{"_account_id":10342,"name":"Jay Faulkner","display_name":"JayF","email":"jay@jvf.cc","username":"JayF","status":"youtube.com/@oss-gr / podcast.gr-oss.io"},"change_message_id":"37a03c7bcc35c006bdd7d0228efddafd4bb22bbc","unresolved":false,"context_lines":[],"source_content_type":"","patch_set":5,"id":"df3b115b_4e02e5e6","updated":"2023-07-11 09:48:16.000000000","message":"Removed #ironic-week-prio as this is outdated and not ready for review/merge. If that\u0027s not correct, please get it passing CI and readd the tag. I\u0027m happy to review at that point if you want to ping me directly on IRC, as well.","commit_id":"20ee510152f176395df49432a46efd1a2af156ec"},{"author":{"_account_id":10342,"name":"Jay Faulkner","display_name":"JayF","email":"jay@jvf.cc","username":"JayF","status":"youtube.com/@oss-gr / podcast.gr-oss.io"},"change_message_id":"6ae0880ffdfeccfee87fabb85a505ce5bd97afb1","unresolved":false,"context_lines":[],"source_content_type":"","patch_set":5,"id":"0090919e_1c476151","updated":"2023-07-11 09:48:40.000000000","message":"Scratch that previous comment, didn\u0027t realize this was a spec. Carry on 😄","commit_id":"20ee510152f176395df49432a46efd1a2af156ec"},{"author":{"_account_id":10342,"name":"Jay Faulkner","display_name":"JayF","email":"jay@jvf.cc","username":"JayF","status":"youtube.com/@oss-gr / podcast.gr-oss.io"},"change_message_id":"b4ec2ed1ad31a3c2b6bbfaf95becdc2f02087700","unresolved":false,"context_lines":[],"source_content_type":"","patch_set":5,"id":"583b9a60_ec4d08ea","in_reply_to":"746456ab_821c8e94","updated":"2023-10-18 21:22:22.000000000","message":"Removing ironic-week-prio; if you revive this idea feel free to re-add it.","commit_id":"20ee510152f176395df49432a46efd1a2af156ec"}],"specs/approved/cross-conductor-pxe.rst":[{"author":{"_account_id":24828,"name":"Kaifeng Wang","email":"kaifeng.w@gmail.com","username":"wangkf"},"change_message_id":"383e2006649e1e8be2787bee8849e7e7bbd8b0e2","unresolved":true,"context_lines":[{"line_number":43,"context_line":"Proposed change"},{"line_number":44,"context_line":"\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d"},{"line_number":45,"context_line":""},{"line_number":46,"context_line":"Enable conductors to speak to other conductors"},{"line_number":47,"context_line":"----------------------------------------------"},{"line_number":48,"context_line":""},{"line_number":49,"context_line":"There is no current, known, techincal limitation to prevent this, however to"}],"source_content_type":"text/x-rst","patch_set":1,"id":"5faa2d36_8e23130a","line":46,"updated":"2023-02-21 15:01:22.000000000","message":"Probably the feature can establish a foundation to address some interesting issues, like conductor has no knowledge of nodes had been takeover :)","commit_id":"7c03ac86d97576f766d8a6c853eaa9be3cf81ea6"},{"author":{"_account_id":11655,"name":"Julia Kreger","email":"juliaashleykreger@gmail.com","username":"jkreger","status":"Flying to the moon with a Jetpack!"},"change_message_id":"f9765f4529c0a055c7265565b59b481c990953ee","unresolved":true,"context_lines":[{"line_number":43,"context_line":"Proposed change"},{"line_number":44,"context_line":"\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d"},{"line_number":45,"context_line":""},{"line_number":46,"context_line":"Enable conductors to speak to other conductors"},{"line_number":47,"context_line":"----------------------------------------------"},{"line_number":48,"context_line":""},{"line_number":49,"context_line":"There is no current, known, techincal limitation to prevent this, however to"}],"source_content_type":"text/x-rst","patch_set":1,"id":"fd499bcc_cbd9d2a6","line":46,"in_reply_to":"5faa2d36_8e23130a","updated":"2023-02-21 19:20:23.000000000","message":"It absolutely can! That being said, we do need to be careful around modeling as such. Steve Baker has been working on trying to make the conductor\u0027s code paths a bit more friendly in regards to graceful shutdown of existing conductors as well.\n\nI think the related challenge might be how we model it, as in if we have a conductor going down, does it tell other conductors it is going down. It feels like the next logical question is \"Who knows who will be the next conductor for that node\".\n\nAnd the next question might be, do we try to poll back to see if the conductor is online, do we pause for that, or not?!\n\nI do think there are some definite possibilities, just we need to be careful and think through them carefully. :)","commit_id":"7c03ac86d97576f766d8a6c853eaa9be3cf81ea6"},{"author":{"_account_id":11655,"name":"Julia Kreger","email":"juliaashleykreger@gmail.com","username":"jkreger","status":"Flying to the moon with a Jetpack!"},"change_message_id":"23c6ab774139317943ee43d984db6b93ea4b33b8","unresolved":false,"context_lines":[{"line_number":43,"context_line":"Proposed change"},{"line_number":44,"context_line":"\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d"},{"line_number":45,"context_line":""},{"line_number":46,"context_line":"Enable conductors to speak to other conductors"},{"line_number":47,"context_line":"----------------------------------------------"},{"line_number":48,"context_line":""},{"line_number":49,"context_line":"There is no current, known, techincal limitation to prevent this, however to"}],"source_content_type":"text/x-rst","patch_set":1,"id":"29caf03f_8c0d2cb6","line":46,"in_reply_to":"fd499bcc_cbd9d2a6","updated":"2023-04-04 18:45:26.000000000","message":"Hopefully addressed in spec. Marking resolved at this point.","commit_id":"7c03ac86d97576f766d8a6c853eaa9be3cf81ea6"},{"author":{"_account_id":10342,"name":"Jay Faulkner","display_name":"JayF","email":"jay@jvf.cc","username":"JayF","status":"youtube.com/@oss-gr / podcast.gr-oss.io"},"change_message_id":"08b73e6d3a2e1be54e38ed18700a5f715bb512c7","unresolved":true,"context_lines":[{"line_number":65,"context_line":"  be logged by the calling conductor."},{"line_number":66,"context_line":""},{"line_number":67,"context_line":".. TODO::"},{"line_number":68,"context_line":"   Do we call these concurrently, or serially?!?!?!"},{"line_number":69,"context_line":""},{"line_number":70,"context_line":"Add a new setup_pxe_environment RPC method"},{"line_number":71,"context_line":"------------------------------------------"}],"source_content_type":"text/x-rst","patch_set":1,"id":"44ad2a25_7631d5b7","line":68,"updated":"2023-02-20 23:07:32.000000000","message":"has to be concurrent, or configurable... serial makes it useless at a higher scale... \n\ndo we still do separate rpc pools for periodic tasks? I wonder if a model like that would be interesting here","commit_id":"7c03ac86d97576f766d8a6c853eaa9be3cf81ea6"},{"author":{"_account_id":11655,"name":"Julia Kreger","email":"juliaashleykreger@gmail.com","username":"jkreger","status":"Flying to the moon with a Jetpack!"},"change_message_id":"e62d97508f13d7fd28de2d5e1709b7596b243df5","unresolved":true,"context_lines":[{"line_number":65,"context_line":"  be logged by the calling conductor."},{"line_number":66,"context_line":""},{"line_number":67,"context_line":".. TODO::"},{"line_number":68,"context_line":"   Do we call these concurrently, or serially?!?!?!"},{"line_number":69,"context_line":""},{"line_number":70,"context_line":"Add a new setup_pxe_environment RPC method"},{"line_number":71,"context_line":"------------------------------------------"}],"source_content_type":"text/x-rst","patch_set":1,"id":"b9aca587_d1fb7920","line":68,"in_reply_to":"44ad2a25_7631d5b7","updated":"2023-02-28 18:56:32.000000000","message":"We do have a periodic task pool limit. But if memory serves the limit is enforced upon initial receipt/processing and has some reservation logic wrapped around that initial allocation or rejection. So not separate pools, one pool, with some limits I guess is the way to put it.","commit_id":"7c03ac86d97576f766d8a6c853eaa9be3cf81ea6"},{"author":{"_account_id":11655,"name":"Julia Kreger","email":"juliaashleykreger@gmail.com","username":"jkreger","status":"Flying to the moon with a Jetpack!"},"change_message_id":"23c6ab774139317943ee43d984db6b93ea4b33b8","unresolved":false,"context_lines":[{"line_number":65,"context_line":"  be logged by the calling conductor."},{"line_number":66,"context_line":""},{"line_number":67,"context_line":".. TODO::"},{"line_number":68,"context_line":"   Do we call these concurrently, or serially?!?!?!"},{"line_number":69,"context_line":""},{"line_number":70,"context_line":"Add a new setup_pxe_environment RPC method"},{"line_number":71,"context_line":"------------------------------------------"}],"source_content_type":"text/x-rst","patch_set":1,"id":"59e7fbd6_1d910c46","line":68,"in_reply_to":"b9aca587_d1fb7920","updated":"2023-04-04 18:45:26.000000000","message":"Hopefully addressed in spec. Marking resolved at this point.","commit_id":"7c03ac86d97576f766d8a6c853eaa9be3cf81ea6"},{"author":{"_account_id":10239,"name":"Dmitry Tantsur","email":"dtantsur@protonmail.com","username":"dtantsur"},"change_message_id":"e28c04232939babd7e1ac20400466d5c2439e288","unresolved":false,"context_lines":[{"line_number":74,"context_line":"  do not want to trigger additional database queries to perform relatively"},{"line_number":75,"context_line":"  focused operations."},{"line_number":76,"context_line":"* A failure in these RPC calls would not be a fatal operation. A failure would"},{"line_number":77,"context_line":"  be logged by the calling conductor."},{"line_number":78,"context_line":""},{"line_number":79,"context_line":".. NOTE::"},{"line_number":80,"context_line":"   We would likely want to call every conductor and then check the results"}],"source_content_type":"text/x-rst","patch_set":2,"id":"6b26fa23_2b48ac30","line":77,"updated":"2023-03-17 11:09:41.000000000","message":"Both bullet points are easier if we specify that all these calls must immediately start a new thread for the actual action.\n\nWe do probably want to retry on NoFreeConductorThreads and other transient failures. I have a TODO item for a long time to add UID\u0027s to JSON RPC calls so that they are safer to retry.","commit_id":"e34f931d4b170d63c50508d5f7fc31fcfaa5ca7f"},{"author":{"_account_id":11655,"name":"Julia Kreger","email":"juliaashleykreger@gmail.com","username":"jkreger","status":"Flying to the moon with a Jetpack!"},"change_message_id":"7e739d4be2b8b043352804d6d0a3145193919fa9","unresolved":false,"context_lines":[{"line_number":74,"context_line":"  do not want to trigger additional database queries to perform relatively"},{"line_number":75,"context_line":"  focused operations."},{"line_number":76,"context_line":"* A failure in these RPC calls would not be a fatal operation. A failure would"},{"line_number":77,"context_line":"  be logged by the calling conductor."},{"line_number":78,"context_line":""},{"line_number":79,"context_line":".. NOTE::"},{"line_number":80,"context_line":"   We would likely want to call every conductor and then check the results"}],"source_content_type":"text/x-rst","patch_set":2,"id":"161a8d58_cea358b6","line":77,"in_reply_to":"6b26fa23_2b48ac30","updated":"2023-03-17 18:39:35.000000000","message":"Agreed, and to be honest, I was thinking that as well on the caller side.","commit_id":"e34f931d4b170d63c50508d5f7fc31fcfaa5ca7f"},{"author":{"_account_id":10239,"name":"Dmitry Tantsur","email":"dtantsur@protonmail.com","username":"dtantsur"},"change_message_id":"e28c04232939babd7e1ac20400466d5c2439e288","unresolved":false,"context_lines":[{"line_number":79,"context_line":".. NOTE::"},{"line_number":80,"context_line":"   We would likely want to call every conductor and then check the results"},{"line_number":81,"context_line":"   from a threadpool, but that is to be determined. Supporting both JSONRPC"},{"line_number":82,"context_line":"   and Oslo.Messaging might complicate this."},{"line_number":83,"context_line":""},{"line_number":84,"context_line":"Add an RPC \"Do if I am responsible\" capability"},{"line_number":85,"context_line":"----------------------------------------------"}],"source_content_type":"text/x-rst","patch_set":2,"id":"b001a2b5_241ab166","line":82,"updated":"2023-03-17 11:09:41.000000000","message":"With JSON RPC - yes. But oslo.messaging should support broadcasting, right?","commit_id":"e34f931d4b170d63c50508d5f7fc31fcfaa5ca7f"},{"author":{"_account_id":11655,"name":"Julia Kreger","email":"juliaashleykreger@gmail.com","username":"jkreger","status":"Flying to the moon with a Jetpack!"},"change_message_id":"7e739d4be2b8b043352804d6d0a3145193919fa9","unresolved":false,"context_lines":[{"line_number":79,"context_line":".. NOTE::"},{"line_number":80,"context_line":"   We would likely want to call every conductor and then check the results"},{"line_number":81,"context_line":"   from a threadpool, but that is to be determined. Supporting both JSONRPC"},{"line_number":82,"context_line":"   and Oslo.Messaging might complicate this."},{"line_number":83,"context_line":""},{"line_number":84,"context_line":"Add an RPC \"Do if I am responsible\" capability"},{"line_number":85,"context_line":"----------------------------------------------"}],"source_content_type":"text/x-rst","patch_set":2,"id":"f2244405_0db6f41f","line":82,"in_reply_to":"b001a2b5_241ab166","updated":"2023-03-17 18:39:35.000000000","message":"We have to create and monitor a special queue aiui. Our entire model is addressed to queues per conductor","commit_id":"e34f931d4b170d63c50508d5f7fc31fcfaa5ca7f"},{"author":{"_account_id":10239,"name":"Dmitry Tantsur","email":"dtantsur@protonmail.com","username":"dtantsur"},"change_message_id":"e28c04232939babd7e1ac20400466d5c2439e288","unresolved":false,"context_lines":[{"line_number":102,"context_line":"conductor which is shutting down."},{"line_number":103,"context_line":""},{"line_number":104,"context_line":"Add a general \"setup pxe \u0027lite\u0027\" RPC method"},{"line_number":105,"context_line":"--------------------------------------------"},{"line_number":106,"context_line":""},{"line_number":107,"context_line":"In order to support deployments where multiple conductors are pooled"},{"line_number":108,"context_line":"together and some base interactions are being front-ended by load"}],"source_content_type":"text/x-rst","patch_set":2,"id":"f9fb5cab_c36ee961","line":105,"updated":"2023-03-17 11:09:41.000000000","message":"I wonder if instead of PXE, the conductor could just watch the database for nodes in their conductor group in DEPLOY*, CLEAN*, RESCUE* and INSPECT* states. This would be a periodic task with just 1 database query.\n\nOn the other hand, imagine 20 nodes being deployed in a group with 5 conductors. This is going to be 80 RPC calls to set up \"lite\" environments and then 80 to tear down. Not terrible, but it\u0027s already a traffic multiplication.\n\nThe reason I see it differently from the \"conductor_shutting_down\" case is that the \"setup PXE\" action does not need to be immediate, unless when testing something with VMs. It\u0027s enough if it happens faster than a regular power on.","commit_id":"e34f931d4b170d63c50508d5f7fc31fcfaa5ca7f"},{"author":{"_account_id":11655,"name":"Julia Kreger","email":"juliaashleykreger@gmail.com","username":"jkreger","status":"Flying to the moon with a Jetpack!"},"change_message_id":"7e739d4be2b8b043352804d6d0a3145193919fa9","unresolved":false,"context_lines":[{"line_number":102,"context_line":"conductor which is shutting down."},{"line_number":103,"context_line":""},{"line_number":104,"context_line":"Add a general \"setup pxe \u0027lite\u0027\" RPC method"},{"line_number":105,"context_line":"--------------------------------------------"},{"line_number":106,"context_line":""},{"line_number":107,"context_line":"In order to support deployments where multiple conductors are pooled"},{"line_number":108,"context_line":"together and some base interactions are being front-ended by load"}],"source_content_type":"text/x-rst","patch_set":2,"id":"a097a367_3d2f4586","line":105,"in_reply_to":"f9fb5cab_c36ee961","updated":"2023-03-17 18:39:35.000000000","message":"So, I was kind of heading down this path mentally and then I thought \"okay, there is a point where this doesn\u0027t scale\"\n\nAnd that is also countered by the fact we *likely* can\u0027t just focus on the DB polling since that can state change between the periodic. I almost think the invocations might need to actually avoid DB queries as much as possible. Given... we\u0027ve had some complaints there over the years.\n\nWhat if we had two models \"Notify to do this\" which could be a serial sort of \"batch up and do it\" sort of model. Like a deferred action queue, and then a \"do right now because I\u0027m going down.\n\nIf we had a unique ID on everything... then we could just pick items and de-dup but we would need to buffer it in memory. Not the worst thing ever...","commit_id":"e34f931d4b170d63c50508d5f7fc31fcfaa5ca7f"},{"author":{"_account_id":10239,"name":"Dmitry Tantsur","email":"dtantsur@protonmail.com","username":"dtantsur"},"change_message_id":"eddad6faed9fd53d46278c951448c86c7e10ddf1","unresolved":true,"context_lines":[{"line_number":8,"context_line":"Cross-Conductor PXE (and RPC)"},{"line_number":9,"context_line":"\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d"},{"line_number":10,"context_line":""},{"line_number":11,"context_line":"https://storyboard.openstack.org/#!/story/XXXXXXX"},{"line_number":12,"context_line":""},{"line_number":13,"context_line":"A reality of operating modes is that multiple conductors are required"},{"line_number":14,"context_line":"in many use cases. The reasons are inherent to operating a scaled environment"}],"source_content_type":"text/x-rst","patch_set":5,"id":"eb04f02e_68b5c262","line":11,"range":{"start_line":11,"start_character":42,"end_line":11,"end_character":49},"updated":"2023-04-20 10:46:46.000000000","message":"C\u0027mon :)","commit_id":"20ee510152f176395df49432a46efd1a2af156ec"},{"author":{"_account_id":11655,"name":"Julia Kreger","email":"juliaashleykreger@gmail.com","username":"jkreger","status":"Flying to the moon with a Jetpack!"},"change_message_id":"58e8ee0dec8a6f6f7bc41a81034c091d147b1e3e","unresolved":true,"context_lines":[{"line_number":8,"context_line":"Cross-Conductor PXE (and RPC)"},{"line_number":9,"context_line":"\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d"},{"line_number":10,"context_line":""},{"line_number":11,"context_line":"https://storyboard.openstack.org/#!/story/XXXXXXX"},{"line_number":12,"context_line":""},{"line_number":13,"context_line":"A reality of operating modes is that multiple conductors are required"},{"line_number":14,"context_line":"in many use cases. The reasons are inherent to operating a scaled environment"}],"source_content_type":"text/x-rst","patch_set":5,"id":"7efa48d1_6924f149","line":11,"range":{"start_line":11,"start_character":42,"end_line":11,"end_character":49},"in_reply_to":"eb04f02e_68b5c262","updated":"2023-09-28 17:19:35.000000000","message":"Eh, need to nuke the test!","commit_id":"20ee510152f176395df49432a46efd1a2af156ec"},{"author":{"_account_id":10342,"name":"Jay Faulkner","display_name":"JayF","email":"jay@jvf.cc","username":"JayF","status":"youtube.com/@oss-gr / podcast.gr-oss.io"},"change_message_id":"ce9cbeda326ad8e9993065e4c2c126c6cf23223e","unresolved":true,"context_lines":[{"line_number":8,"context_line":"Cross-Conductor PXE (and RPC)"},{"line_number":9,"context_line":"\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d"},{"line_number":10,"context_line":""},{"line_number":11,"context_line":"https://storyboard.openstack.org/#!/story/XXXXXXX"},{"line_number":12,"context_line":""},{"line_number":13,"context_line":"A reality of operating modes is that multiple conductors are required"},{"line_number":14,"context_line":"in many use cases. The reasons are inherent to operating a scaled environment"}],"source_content_type":"text/x-rst","patch_set":5,"id":"a869a588_eba24dc7","line":11,"range":{"start_line":11,"start_character":42,"end_line":11,"end_character":49},"in_reply_to":"eb04f02e_68b5c262","updated":"2023-07-14 08:44:48.000000000","message":"Needs an actual LP bug","commit_id":"20ee510152f176395df49432a46efd1a2af156ec"},{"author":{"_account_id":10239,"name":"Dmitry Tantsur","email":"dtantsur@protonmail.com","username":"dtantsur"},"change_message_id":"eddad6faed9fd53d46278c951448c86c7e10ddf1","unresolved":true,"context_lines":[{"line_number":14,"context_line":"in many use cases. The reasons are inherent to operating a scaled environment"},{"line_number":15,"context_line":"with operating uptime requirements. Many operators route around the inherent"},{"line_number":16,"context_line":"limitations with the use of a shared network filesystem, but that can introduce"},{"line_number":17,"context_line":"it\u0027s own issues."},{"line_number":18,"context_line":""},{"line_number":19,"context_line":"The underlying challenge to these scaled environments is that multi-conductor"},{"line_number":20,"context_line":"environments have no means to \"hand off\" or \"prepare\" in advance for workloads"}],"source_content_type":"text/x-rst","patch_set":5,"id":"b2f67e37_1104e200","line":17,"range":{"start_line":17,"start_character":0,"end_line":17,"end_character":4},"updated":"2023-04-20 10:46:46.000000000","message":"nit: its","commit_id":"20ee510152f176395df49432a46efd1a2af156ec"},{"author":{"_account_id":10239,"name":"Dmitry Tantsur","email":"dtantsur@protonmail.com","username":"dtantsur"},"change_message_id":"eddad6faed9fd53d46278c951448c86c7e10ddf1","unresolved":true,"context_lines":[{"line_number":28,"context_line":""},{"line_number":29,"context_line":"At the highest level, we have the following intertwined challenges:"},{"line_number":30,"context_line":""},{"line_number":31,"context_line":"* We have no ability to enable multiple standalone conductors in an"},{"line_number":32,"context_line":"  environment without the use of some sort of shared or clustered filesystem."},{"line_number":33,"context_line":"  From our point of view, at this time, this is out of scope."},{"line_number":34,"context_line":""},{"line_number":35,"context_line":"* We also have DHCP management primitives inside of Ironic, and we can\u0027t"}],"source_content_type":"text/x-rst","patch_set":5,"id":"7aad6c9c_6f23fb58","line":32,"range":{"start_line":31,"start_character":24,"end_line":32,"end_character":77},"updated":"2023-04-20 10:46:46.000000000","message":"Isn\u0027t it the same problem as #2 or do you have something else in mind?","commit_id":"20ee510152f176395df49432a46efd1a2af156ec"},{"author":{"_account_id":10239,"name":"Dmitry Tantsur","email":"dtantsur@protonmail.com","username":"dtantsur"},"change_message_id":"eddad6faed9fd53d46278c951448c86c7e10ddf1","unresolved":true,"context_lines":[{"line_number":52,"context_line":"----------------------------------------------------"},{"line_number":53,"context_line":""},{"line_number":54,"context_line":"At present, we don\u0027t have this capability today, and there is no known"},{"line_number":55,"context_line":"limitation to prevent this, it is more a case of \"not yet\"."},{"line_number":56,"context_line":""},{"line_number":57,"context_line":"The idea would be to enable conductors within the same conductor group"},{"line_number":58,"context_line":"to notify each other of actions or changes."}],"source_content_type":"text/x-rst","patch_set":5,"id":"e4436088_e8a23409","line":55,"updated":"2023-04-20 10:46:46.000000000","message":"Strictly speaking, we already have this. It is the continue_node_deploy/continue_node_clean call that is sometimes used to trigger a transition to the next step: https://opendev.org/openstack/ironic/src/commit/8ef9db15704c0c2cb1342c7c1554bfa8d8a7a2e3/ironic/conductor/utils.py#L924","commit_id":"20ee510152f176395df49432a46efd1a2af156ec"},{"author":{"_account_id":10239,"name":"Dmitry Tantsur","email":"dtantsur@protonmail.com","username":"dtantsur"},"change_message_id":"eddad6faed9fd53d46278c951448c86c7e10ddf1","unresolved":true,"context_lines":[{"line_number":62,"context_line":"PXE setup is needed."},{"line_number":63,"context_line":""},{"line_number":64,"context_line":"To do this, we would need to add an RPC helper to enumerate through conductors"},{"line_number":65,"context_line":"and send the same RPC message to *each* conductor in the conductor group."},{"line_number":66,"context_line":""},{"line_number":67,"context_line":"Upon completion from the nodes, then the calling method would return."},{"line_number":68,"context_line":""}],"source_content_type":"text/x-rst","patch_set":5,"id":"0ad69254_8e74f322","line":65,"updated":"2023-04-20 10:46:46.000000000","message":"I\u0027d prefer we only do this when no other options exist, i.e. for JSON RPC. RabbitMQ is supposed to be good at broadcasting.\n\nFrom a brief reading of the oslo.messaging code, the client will need to create a Target with fanout\u003dTrue and no server specified. Limitation: the conductor group filtering will need to be done on the server side. But this approach should scale really well to many conductors.\n\nJSON RPC should work the way you describe. Any ideally we should make this optional to be able to fall back to periodic tasks.","commit_id":"20ee510152f176395df49432a46efd1a2af156ec"},{"author":{"_account_id":10239,"name":"Dmitry Tantsur","email":"dtantsur@protonmail.com","username":"dtantsur"},"change_message_id":"eddad6faed9fd53d46278c951448c86c7e10ddf1","unresolved":true,"context_lines":[{"line_number":70,"context_line":""},{"line_number":71,"context_line":"* No communication to conductors outside of the existing/running"},{"line_number":72,"context_line":"  conductor group."},{"line_number":73,"context_line":"* RPC methods to be called would be detailed for focused, stratiegic, and"},{"line_number":74,"context_line":"  generally \"lockless\" calls where a task or lock would not be needed as we"},{"line_number":75,"context_line":"  do not want to trigger additional database queries to perform relatively"},{"line_number":76,"context_line":"  focused operations."}],"source_content_type":"text/x-rst","patch_set":5,"id":"51186e76_36c60aa3","line":73,"range":{"start_line":73,"start_character":58,"end_line":73,"end_character":68},"updated":"2023-04-20 10:46:46.000000000","message":"nit: strategic","commit_id":"20ee510152f176395df49432a46efd1a2af156ec"},{"author":{"_account_id":10239,"name":"Dmitry Tantsur","email":"dtantsur@protonmail.com","username":"dtantsur"},"change_message_id":"eddad6faed9fd53d46278c951448c86c7e10ddf1","unresolved":true,"context_lines":[{"line_number":76,"context_line":"  focused operations."},{"line_number":77,"context_line":"* Calls across multiple conductors would utilize new threads being launched"},{"line_number":78,"context_line":"  in the calling conductor."},{"line_number":79,"context_line":"* Depending on the failure, we may retry the operation. For Example,"},{"line_number":80,"context_line":"  NoFreeConductorThreads and other obvious transient conenctivity failures,"},{"line_number":81,"context_line":"  it does make sense to retry upon. Failures from within the remote conductor,"},{"line_number":82,"context_line":"  maybe not."}],"source_content_type":"text/x-rst","patch_set":5,"id":"f9328f1c_96f1f66b","line":79,"updated":"2023-04-20 10:46:46.000000000","message":"Dunno. With JSON RPC - sure. With Rabbit, this requires falling back to N requests per N conductor instead of 1 broadcast/fanout.","commit_id":"20ee510152f176395df49432a46efd1a2af156ec"},{"author":{"_account_id":10239,"name":"Dmitry Tantsur","email":"dtantsur@protonmail.com","username":"dtantsur"},"change_message_id":"eddad6faed9fd53d46278c951448c86c7e10ddf1","unresolved":true,"context_lines":[{"line_number":77,"context_line":"* Calls across multiple conductors would utilize new threads being launched"},{"line_number":78,"context_line":"  in the calling conductor."},{"line_number":79,"context_line":"* Depending on the failure, we may retry the operation. For Example,"},{"line_number":80,"context_line":"  NoFreeConductorThreads and other obvious transient conenctivity failures,"},{"line_number":81,"context_line":"  it does make sense to retry upon. Failures from within the remote conductor,"},{"line_number":82,"context_line":"  maybe not."},{"line_number":83,"context_line":""}],"source_content_type":"text/x-rst","patch_set":5,"id":"02b7017e_0828286f","line":80,"updated":"2023-04-20 10:46:46.000000000","message":"nit: connectivity","commit_id":"20ee510152f176395df49432a46efd1a2af156ec"},{"author":{"_account_id":10239,"name":"Dmitry Tantsur","email":"dtantsur@protonmail.com","username":"dtantsur"},"change_message_id":"eddad6faed9fd53d46278c951448c86c7e10ddf1","unresolved":true,"context_lines":[{"line_number":95,"context_line":"term, the blast radius of the impending environment change."},{"line_number":96,"context_line":""},{"line_number":97,"context_line":"It is fine for us to get the calls, but another conductor can only act if it"},{"line_number":98,"context_line":"is appropriate for it to do so."},{"line_number":99,"context_line":""},{"line_number":100,"context_line":"In this case, the message conveyed between the conductors would need to"},{"line_number":101,"context_line":"include the identity of the conductor the request is being sent from which"}],"source_content_type":"text/x-rst","patch_set":5,"id":"0d7b2f2f_994ea19c","line":98,"updated":"2023-04-20 10:46:46.000000000","message":"I wonder if it\u0027s better to determine the responsible conductor on the caller side using the same \"hash ring minus the affected host\" approach. Then it\u0027s a normal peer-to-peer RPC call instead of a broadcast.","commit_id":"20ee510152f176395df49432a46efd1a2af156ec"},{"author":{"_account_id":10239,"name":"Dmitry Tantsur","email":"dtantsur@protonmail.com","username":"dtantsur"},"change_message_id":"eddad6faed9fd53d46278c951448c86c7e10ddf1","unresolved":true,"context_lines":[{"line_number":136,"context_line":"and track actions in a cross-conductor fashion, but we encounter the database"},{"line_number":137,"context_line":"as the central point in doing such, and the overall theme to enable"},{"line_number":138,"context_line":"cross-conductor RPC *is* to enable us to not trigger additional database"},{"line_number":139,"context_line":"activity."},{"line_number":140,"context_line":""},{"line_number":141,"context_line":"Data model impact"},{"line_number":142,"context_line":"-----------------"}],"source_content_type":"text/x-rst","patch_set":5,"id":"a548f2d1_ca815422","line":139,"updated":"2023-04-20 10:46:46.000000000","message":"Well, we already have periodic tasks that check for dead conductors. There is no way around it: a conductor may go offline abruptly. An alternative could be to pre-emptively configure PXE on the conductor that would own the node.","commit_id":"20ee510152f176395df49432a46efd1a2af156ec"},{"author":{"_account_id":10342,"name":"Jay Faulkner","display_name":"JayF","email":"jay@jvf.cc","username":"JayF","status":"youtube.com/@oss-gr / podcast.gr-oss.io"},"change_message_id":"ce9cbeda326ad8e9993065e4c2c126c6cf23223e","unresolved":true,"context_lines":[{"line_number":236,"context_line":"---------------------"},{"line_number":237,"context_line":""},{"line_number":238,"context_line":"Cross-conductor communication capabiliites will be required and"},{"line_number":239,"context_line":"operators may need to update firewall ACL rules accordingly."},{"line_number":240,"context_line":""},{"line_number":241,"context_line":"Developer impact"},{"line_number":242,"context_line":"----------------"}],"source_content_type":"text/x-rst","patch_set":5,"id":"9abeee13_8e9979d7","line":239,"updated":"2023-07-14 08:44:48.000000000","message":"Will taking advantage of this require any changes in deployment automation, generally? e.g. around strategies used when restarting/rebuilding conductors? Might want to mention that here if so","commit_id":"20ee510152f176395df49432a46efd1a2af156ec"}]}
