)]}' {"/PATCHSET_LEVEL":[{"author":{"_account_id":9816,"name":"Takashi Kajinami","email":"kajinamit@oss.nttdata.com","username":"kajinamit"},"change_message_id":"035e00fa61d85946d1867d8427699c623f587743","unresolved":true,"context_lines":[],"source_content_type":"","patch_set":1,"id":"7facfaa5_cc41ea1c","updated":"2024-04-24 15:50:15.000000000","message":"Please correct me if I\u0027m wrong, but this new check mechanism is implemented in compute node and all compute nodes should be upgraded to 2024.2 or newer versions to use this check.\n\nHowever this does not catch the case where there are any compute nodes left in old version in the cluster, which was the actual motivation to bring some validations. I thought the check should be implemented at conductor side somehow so that it can detect the too old compute nodes which are not upgraded and stay at an ancient version. Is that something added later or did I misunderstand the unsupported usage which we want to reject by this validation ?","commit_id":"e7c7525356babb11e01d785aa26dbd24ef6a29f8"},{"author":{"_account_id":4393,"name":"Dan Smith","email":"dms@danplanet.com","username":"danms"},"change_message_id":"6e127744c9901c4f59da84b3b51cb06a0c5ccb8e","unresolved":true,"context_lines":[],"source_content_type":"","patch_set":1,"id":"947f796c_4da5f22c","in_reply_to":"7facfaa5_cc41ea1c","updated":"2024-04-24 16:00:18.000000000","message":"In this scheme, the conductor is making the decision, but on request from the compute node. Yes, this requires compute node to be updated in order to work, and we won\u0027t reap the benefits of this until we get to the point where 2024.2 compute nodes become \"ancient.\"\n\nHaving conductor do this on its own (i.e. snipe bad computes from a distance) is more complex and brings a host of other issues we\u0027d need to solve, IMHO. Periodics on conductor nodes are hard to synchronize (i.e. to prevent thousands of workers from running them all the time, at the least). IMHO, the workaround config that can/will disable this (in the final version, as mentioned) needs to be per-compute and not per-cluster. This is definitely a forward-looking sort of thing, not a solution to being able to remove existing checks earlier.","commit_id":"e7c7525356babb11e01d785aa26dbd24ef6a29f8"},{"author":{"_account_id":11604,"name":"sean mooney","email":"smooney@redhat.com","username":"sean-k-mooney"},"change_message_id":"9062ae42d7824cd74587dfe38d7e970401694cee","unresolved":true,"context_lines":[],"source_content_type":"","patch_set":1,"id":"27442e45_111ea066","in_reply_to":"947f796c_4da5f22c","updated":"2024-04-24 16:29:07.000000000","message":"one thing we could consider is a change to nova-manage or nova status to allow you to list the compute that are to old and optionally disable them\n\nif we want to provide a middle ground it wont be automatic but a human can run it, disabel the really old compute service and then review why they are not upgraded.","commit_id":"e7c7525356babb11e01d785aa26dbd24ef6a29f8"}],"nova/compute/manager.py":[{"author":{"_account_id":11604,"name":"sean mooney","email":"smooney@redhat.com","username":"sean-k-mooney"},"change_message_id":"51fc28a4f2afc95be17a2c0d896be00a114f2123","unresolved":true,"context_lines":[{"line_number":10656,"context_line":" self.service_ref.disabled_reason \u003d ("},{"line_number":10657,"context_line":" \u0027Service older than cluster-supported threshold\u0027)"},{"line_number":10658,"context_line":" self.service_ref.save()"},{"line_number":10659,"context_line":" return"},{"line_number":10660,"context_line":""},{"line_number":10661,"context_line":" try:"},{"line_number":10662,"context_line":" nodenames \u003d set(self.driver.get_available_nodes())"}],"source_content_type":"text/x-python","patch_set":1,"id":"cecc0574_8d229554","line":10659,"updated":"2024-04-15 17:52:20.000000000","message":"presumable if we are doing this we are also going to remove the current check we have to stop the conducotr from starting up correct. that woudl be in an addtional follow up patch?\n\nat the ptg you also mentioned the scars you had form teh build failure auto delete so i assume this patch will add a new config option to opt out of this behvivior?\n\nhad you tought how that should work.\ni.e. should it be a hard error and refuse to start when disabling is turned off\nor should it just start and log a warning saying its operating outside of the supproted range?","commit_id":"e7c7525356babb11e01d785aa26dbd24ef6a29f8"},{"author":{"_account_id":4393,"name":"Dan Smith","email":"dms@danplanet.com","username":"danms"},"change_message_id":"6a2daae8af2f1acd7df30eb496aeeb6adfe8593d","unresolved":true,"context_lines":[{"line_number":10656,"context_line":" self.service_ref.disabled_reason \u003d ("},{"line_number":10657,"context_line":" \u0027Service older than cluster-supported threshold\u0027)"},{"line_number":10658,"context_line":" self.service_ref.save()"},{"line_number":10659,"context_line":" return"},{"line_number":10660,"context_line":""},{"line_number":10661,"context_line":" try:"},{"line_number":10662,"context_line":" nodenames \u003d set(self.driver.get_available_nodes())"}],"source_content_type":"text/x-python","patch_set":1,"id":"bf1a03bb_d5ec265c","line":10659,"in_reply_to":"cecc0574_8d229554","updated":"2024-04-15 17:56:23.000000000","message":"\u003e presumable if we are doing this we are also going to remove the current check we have to stop the conducotr from starting up correct. that woudl be in an addtional follow up patch?\n\nYeah, separate if we\u0027re going to do it for sure.\n\n\u003e at the ptg you also mentioned the scars you had form teh build failure auto delete so i assume this patch will add a new config option to opt out of this behvivior?\n\nYep, as mentioned in my comment on the first patch.\n\n\u003e had you tought how that should work.\n\u003e i.e. should it be a hard error and refuse to start when disabling is turned off\n\u003e or should it just start and log a warning saying its operating outside of the supproted range?\n\nAs I mentioned, I think the startup failure is too harsh, based on the experience(s) with people hitting this, so I\u0027d prefer to leave it as auto-disable (like this) and the workaround flag just disables the auto-disable behavior but leaves the error log I\u0027m adding here.","commit_id":"e7c7525356babb11e01d785aa26dbd24ef6a29f8"}]}