)]}' {"/COMMIT_MSG":[{"author":{"_account_id":15197,"name":"Pierre Riteau","email":"pierre@stackhpc.com","username":"priteau","status":"StackHPC"},"change_message_id":"23e5274db7d9c621aedd446c5482eb89e7445221","unresolved":true,"context_lines":[{"line_number":4,"context_line":"Commit: Viktor Krivak \u003cviktor.krivak@ultimum.io\u003e"},{"line_number":5,"context_line":"CommitDate: 2024-09-03 07:33:16 +0000"},{"line_number":6,"context_line":""},{"line_number":7,"context_line":"Ensure claims to placement are freed after evacuate complete"},{"line_number":8,"context_line":"Improve resource_tracker delete_allocation_for_evacuated_instance to handle non-cached compute hosts"},{"line_number":9,"context_line":"Fix test to accept this change"},{"line_number":10,"context_line":""}],"source_content_type":"text/x-gerrit-commit-message","patch_set":3,"id":"6362ef3b_06275bee","line":7,"updated":"2024-09-03 08:06:11.000000000","message":"Please add an empty line after the commit summary. You should also wrap lines to 72 characters.","commit_id":"86d439198b65055356586813ed353423a0c426f9"}],"/PATCHSET_LEVEL":[{"author":{"_account_id":27339,"name":"Michal Arbet","email":"michal.arbet@ultimum.io","username":"michalarbet"},"change_message_id":"ea60a724f8242ca7b9861835fe1f13336b946ccd","unresolved":false,"context_lines":[],"source_content_type":"","patch_set":2,"id":"ff744ccc_0a514b7c","updated":"2024-08-22 12:18:07.000000000","message":"Hi Folks, please could you check this ? Ready for review","commit_id":"3045d82de2be4b1cf04da6fd144ace5706e37549"},{"author":{"_account_id":22072,"name":"Viktor Křivák","email":"viktor.krivak@ultimum.io","username":"viktor-krivak"},"change_message_id":"8f7ef0749f5da0b2cd48986aecdcf5a93d2b614e","unresolved":false,"context_lines":[],"source_content_type":"","patch_set":2,"id":"6870f92f_49d6a808","updated":"2024-08-22 12:32:28.000000000","message":"I think this one will need more discussion. At this moment evacuation of VM doesn\u0027t remove claims in placement. This behaviour was constraint in tests, but I cannot find any mention about that in documentation.\nWhen I experienced this behaviour I was 100% sure that it was a bug until I found constraints in tests.\nHere are a few scenarios:\n\nUser have multiple generations of hypervisor mostly new and powerful but few old and not so performant. One of the old hypervisor completelly fail (motherboard issue, CPU burnout, ...). User then call evacuate on each VM on old hypervisor and just remove old hypervisor all together from system. In this scenario each VM will kept its claim to forever.\n\nAnother scenario. One hypervisor dies again because of some HW failure. The user will order replacement parts for that HW and call evacuate for each VM. Since delivery of replacement parts can take weeks. Before the replacement part arrives another hypervisor dies. But if some VMs were evacuated to this one from the first failed, they will have claims against multiple hypervisors and therefore they cannot be evacuated again.\n\nLast scenario. When some automated evacuation solution like Masakari is used, VMs are evacuated almost immediately when hypervisor fail. If another hypervisor fail, it will not be possible to evacuate VMs again until old hypervisor is fixed. This could cause serious downtime in otherwise in environment where we try to avoid such downtime at all cost.\n\nAnd I cannot think about any situation when this behaviour is wanted. Because of that, I don\u0027t think this should count as behaviour change. Only bug that was for some reason codified in tests.","commit_id":"3045d82de2be4b1cf04da6fd144ace5706e37549"},{"author":{"_account_id":11604,"name":"sean mooney","email":"smooney@redhat.com","username":"sean-k-mooney"},"change_message_id":"9de36e760c2f0d30f12b84d2b74a0bbd11e01c1a","unresolved":false,"context_lines":[],"source_content_type":"","patch_set":2,"id":"754974bd_8a78df4f","updated":"2024-08-22 14:24:53.000000000","message":"im not sure im comfortable with the direction taken to do this\nspecifically having a resource tracker in a different compute service managing a compute nodes allocations associated with a different compute service.\n\nthat a pretty big desging violation so we would need wider input form the core team to determin if this is accpatble or not.\n\n\n-1 on the direction of the patch until this can be discuss more widely.","commit_id":"3045d82de2be4b1cf04da6fd144ace5706e37549"},{"author":{"_account_id":22072,"name":"Viktor Křivák","email":"viktor.krivak@ultimum.io","username":"viktor-krivak"},"change_message_id":"9c48929896e057771c9727a4acb7a2d0ac508c9c","unresolved":false,"context_lines":[],"source_content_type":"","patch_set":2,"id":"c47480af_d4600da2","updated":"2024-08-22 09:47:59.000000000","message":"recheck","commit_id":"3045d82de2be4b1cf04da6fd144ace5706e37549"},{"author":{"_account_id":22072,"name":"Viktor Křivák","email":"viktor.krivak@ultimum.io","username":"viktor-krivak"},"change_message_id":"ca1d6d653b764fad26bcc91187c11094543623eb","unresolved":false,"context_lines":[],"source_content_type":"","patch_set":2,"id":"85be1abb_7ee7b19c","in_reply_to":"754974bd_8a78df4f","updated":"2024-08-23 12:09:33.000000000","message":"I understood your point, but I don\u0027t think we could do it elsewhere. Original node cannot do that, because it is failed and conductor doesn\u0027t have an information if evacuation was successful. Except when we update VM info, but I think it would be terrible idea to put it there.","commit_id":"3045d82de2be4b1cf04da6fd144ace5706e37549"},{"author":{"_account_id":9708,"name":"Balazs Gibizer","display_name":"gibi","email":"gibizer@gmail.com","username":"gibi"},"change_message_id":"4f1466b168096a735aa83037130b2d288fc55464","unresolved":false,"context_lines":[],"source_content_type":"","patch_set":3,"id":"c93d07ab_5fc7f68f","updated":"2024-09-03 08:19:24.000000000","message":"I left questions in the bug report I think we need to understand a bit better what is leading to the evacuation error in your env before we can try to fix it.","commit_id":"86d439198b65055356586813ed353423a0c426f9"}],"nova/compute/manager.py":[{"author":{"_account_id":11604,"name":"sean mooney","email":"smooney@redhat.com","username":"sean-k-mooney"},"change_message_id":"9de36e760c2f0d30f12b84d2b74a0bbd11e01c1a","unresolved":true,"context_lines":[{"line_number":3920,"context_line":" self._notify_instance_rebuild_error(context, instance, e, bdms)"},{"line_number":3921,"context_line":" raise"},{"line_number":3922,"context_line":" else:"},{"line_number":3923,"context_line":" # NOTE(viktor.krivak): Evacuation doesn\u0027t have confirmation,"},{"line_number":3924,"context_line":" # we need to delete old claim here, otherwise it will"},{"line_number":3925,"context_line":" # stuck in placement until original hypervisor came back."},{"line_number":3926,"context_line":" # There is no guarantee that original hypervisor came"}],"source_content_type":"text/x-python","patch_set":2,"id":"52679e9c_d7fe1476","line":3923,"updated":"2024-08-22 14:24:53.000000000","message":"is there a reason your doing it before\n self.rt.finish_evacuation(instance, scheduled_node, migration)\n \ni would have only dropped the souce allcoation if we had finished the migration as it could still fail here.\n\nwe have not actully updated instance.host at this point so its not correct to drop the allcoation until we have saved that change to the db which happes in finish_evacuation","commit_id":"3045d82de2be4b1cf04da6fd144ace5706e37549"},{"author":{"_account_id":22072,"name":"Viktor Křivák","email":"viktor.krivak@ultimum.io","username":"viktor-krivak"},"change_message_id":"ca1d6d653b764fad26bcc91187c11094543623eb","unresolved":false,"context_lines":[{"line_number":3920,"context_line":" self._notify_instance_rebuild_error(context, instance, e, bdms)"},{"line_number":3921,"context_line":" raise"},{"line_number":3922,"context_line":" else:"},{"line_number":3923,"context_line":" # NOTE(viktor.krivak): Evacuation doesn\u0027t have confirmation,"},{"line_number":3924,"context_line":" # we need to delete old claim here, otherwise it will"},{"line_number":3925,"context_line":" # stuck in placement until original hypervisor came back."},{"line_number":3926,"context_line":" # There is no guarantee that original hypervisor came"}],"source_content_type":"text/x-python","patch_set":2,"id":"14ebd2ce_811fcbb3","line":3923,"in_reply_to":"52679e9c_d7fe1476","updated":"2024-08-23 12:09:33.000000000","message":"Good point, I\u0027ll move it.","commit_id":"3045d82de2be4b1cf04da6fd144ace5706e37549"}],"nova/compute/resource_tracker.py":[{"author":{"_account_id":11604,"name":"sean mooney","email":"smooney@redhat.com","username":"sean-k-mooney"},"change_message_id":"9de36e760c2f0d30f12b84d2b74a0bbd11e01c1a","unresolved":true,"context_lines":[{"line_number":1818,"context_line":" cn_uuid \u003d None"},{"line_number":1819,"context_line":" try:"},{"line_number":1820,"context_line":" cn_uuid \u003d self.compute_nodes[node].uuid"},{"line_number":1821,"context_line":" except KeyError:"},{"line_number":1822,"context_line":" # Node is not locally cached, try to get uuid from placement,"},{"line_number":1823,"context_line":" # if none is found we could immediately end because there"},{"line_number":1824,"context_line":" # is nothing to erase"},{"line_number":1825,"context_line":" allocs \u003d self.reportclient.get_allocs_for_consumer("},{"line_number":1826,"context_line":" context,"},{"line_number":1827,"context_line":" instance.uuid)"},{"line_number":1828,"context_line":" for provider_uuid in allocs[\u0027allocations\u0027].keys():"},{"line_number":1829,"context_line":" rsc_pn \u003d self.reportclient.get_resource_provider_name("},{"line_number":1830,"context_line":" context,"},{"line_number":1831,"context_line":" provider_uuid)"},{"line_number":1832,"context_line":" if rsc_pn \u003d\u003d node:"},{"line_number":1833,"context_line":" cn_uuid \u003d provider_uuid"},{"line_number":1834,"context_line":" break"},{"line_number":1835,"context_line":" if cn_uuid is None:"},{"line_number":1836,"context_line":" # There are no allocation to this node"},{"line_number":1837,"context_line":" return"}],"source_content_type":"text/x-python","patch_set":2,"id":"61539c6a_ec524f51","line":1834,"range":{"start_line":1821,"start_character":7,"end_line":1834,"end_character":25},"updated":"2024-08-22 14:24:53.000000000","message":"so this is kidn of problematic and breaking some design asumtions.\n\nonly distirbute virt dirvers shoudl ever managne more then one compute node.\ni.e. ironic\n\nfor libvirt for exampel there is a 1:1 mappign between the comptue agent that manages the compute node and the compute node iteself.\n\nso having one comptue service/agent manage allocates related to a diffent node assocated with a diffent compute service/agent is not really allowed.\n\neven in the ironic case we hage worked hard over the last few released to prevernt ironic form doing that going forward.\n\nso i think this is not something we shoudl do without careful discussion\n\n\nthis really is an anti patthern so im not very comfrotable with doing this in resouce tracker.","commit_id":"3045d82de2be4b1cf04da6fd144ace5706e37549"},{"author":{"_account_id":22072,"name":"Viktor Křivák","email":"viktor.krivak@ultimum.io","username":"viktor-krivak"},"change_message_id":"ca1d6d653b764fad26bcc91187c11094543623eb","unresolved":false,"context_lines":[{"line_number":1818,"context_line":" cn_uuid \u003d None"},{"line_number":1819,"context_line":" try:"},{"line_number":1820,"context_line":" cn_uuid \u003d self.compute_nodes[node].uuid"},{"line_number":1821,"context_line":" except KeyError:"},{"line_number":1822,"context_line":" # Node is not locally cached, try to get uuid from placement,"},{"line_number":1823,"context_line":" # if none is found we could immediately end because there"},{"line_number":1824,"context_line":" # is nothing to erase"},{"line_number":1825,"context_line":" allocs \u003d self.reportclient.get_allocs_for_consumer("},{"line_number":1826,"context_line":" context,"},{"line_number":1827,"context_line":" instance.uuid)"},{"line_number":1828,"context_line":" for provider_uuid in allocs[\u0027allocations\u0027].keys():"},{"line_number":1829,"context_line":" rsc_pn \u003d self.reportclient.get_resource_provider_name("},{"line_number":1830,"context_line":" context,"},{"line_number":1831,"context_line":" provider_uuid)"},{"line_number":1832,"context_line":" if rsc_pn \u003d\u003d node:"},{"line_number":1833,"context_line":" cn_uuid \u003d provider_uuid"},{"line_number":1834,"context_line":" break"},{"line_number":1835,"context_line":" if cn_uuid is None:"},{"line_number":1836,"context_line":" # There are no allocation to this node"},{"line_number":1837,"context_line":" return"}],"source_content_type":"text/x-python","patch_set":2,"id":"c9cdc102_aa21d1ed","line":1834,"range":{"start_line":1821,"start_character":7,"end_line":1834,"end_character":25},"in_reply_to":"61539c6a_ec524f51","updated":"2024-08-23 12:09:33.000000000","message":"But we can\u0027t do it in another compute since it is failed. I understand that this could violate a pattern, but evacuation is kind of strange case where it is not possible to placement from same node. And every other node will have same problem. Even if we do this somehow on control plane it will still break this pattern.","commit_id":"3045d82de2be4b1cf04da6fd144ace5706e37549"}],"nova/tests/functional/regressions/test_bug_1794996.py":[{"author":{"_account_id":11604,"name":"sean mooney","email":"smooney@redhat.com","username":"sean-k-mooney"},"change_message_id":"0de9f4afc4f43c16ed03f55d9ac4e67721e588a3","unresolved":true,"context_lines":[{"line_number":72,"context_line":" self._delete_and_check_allocations(server)"},{"line_number":73,"context_line":""},{"line_number":74,"context_line":" # restart the source compute"},{"line_number":75,"context_line":" self.compute1 \u003d self.restart_compute_service(self.compute1)"}],"source_content_type":"text/x-python","patch_set":2,"id":"7572200c_4b1d91bd","line":75,"updated":"2024-08-22 14:13:25.000000000","message":"so in addtion to restarting the souce comptue we ned to assert that it starus up propelry adn the remote cleanup of the allocation does nto break anything.\n\nas such i think we need too wait for the compute service to be up in the compute service api.\n\nwe might also want to assert that it succsfully ran the update available resources task and corerctly updated the resouce provider or similar.\n\nim not sure if that is over kill but we are not really asserting the orginal source actully starts currently so i feel this is incomplete.\n\nwe do get some test coverage form teh evacuate gate hook indreictly however that is not actully stoping the compute serivce and restarting it so it does not actully this this part.","commit_id":"3045d82de2be4b1cf04da6fd144ace5706e37549"},{"author":{"_account_id":22072,"name":"Viktor Křivák","email":"viktor.krivak@ultimum.io","username":"viktor-krivak"},"change_message_id":"ca1d6d653b764fad26bcc91187c11094543623eb","unresolved":false,"context_lines":[{"line_number":72,"context_line":" self._delete_and_check_allocations(server)"},{"line_number":73,"context_line":""},{"line_number":74,"context_line":" # restart the source compute"},{"line_number":75,"context_line":" self.compute1 \u003d self.restart_compute_service(self.compute1)"}],"source_content_type":"text/x-python","patch_set":2,"id":"b9fa808a_b33c254b","line":75,"in_reply_to":"7572200c_4b1d91bd","updated":"2024-08-23 12:09:33.000000000","message":"I\u0027ll try to rework it a little bit. I tried to make as less changes as possible, but you are correct that this test should be better.","commit_id":"3045d82de2be4b1cf04da6fd144ace5706e37549"}]}