)]}'
{"/COMMIT_MSG":[{"author":{"_account_id":9708,"name":"Balazs Gibizer","display_name":"gibi","email":"gibizer@gmail.com","username":"gibi"},"change_message_id":"8603f199112c33bd0a47020ee1fe0440c78e99fc","unresolved":true,"context_lines":[{"line_number":26,"context_line":"computes associated to `done` evacuation records. Forcing operators to"},{"line_number":27,"context_line":"restart the service allowing them to move to a `completed` state before"},{"line_number":28,"context_line":"the service can be forced up again."},{"line_number":29,"context_line":""},{"line_number":30,"context_line":"Closes-Bug: #1922053"},{"line_number":31,"context_line":"Change-Id: I95882ea28564a31a6b4f8b665de462774d84edfc"}],"source_content_type":"text/x-gerrit-commit-message","patch_set":1,"id":"d5f64d40_302d42ec","line":29,"updated":"2021-04-01 08:28:06.000000000","message":"I don\u0027t think this is a good idea. \n\n1) The force_down operation is specifically created to ignore safety. The safe way would be to wait until the compute is considered down by the controller and then evacuate. We documented that force_down only applicable if fencing is ensured. If this is ignored then any kind of corruption can happen (ip duplication, volume corruption etc) regardless if the compute service is restarted. So building in safety mechanism to an operation that designed to be unsafe is, I think, going backwards. So I suggest to extend the force_down documentation to state that compute service restart is required before the compute is unforced. \n\n2) The reported issue is applicable to non forced_down case too. Imagine the following scenario:\n* control network fails between a compute host and the controller cluster\n* the compute times out and considered down from the controller perspective\n* admin issues evacuation of the VMs on this host. The evacuation goes through\n* the network failure resolved and the compute service joins back to the cluster without any restart. \n\nAs in this case there is no explicit up action coming from the user, there is no way to reject things. As this can lead to the same corruption than the original bug reported. I think this is a more serious case than the force_down case as here the admin did nothing known to be dangerous. Here I think a mitigation would be to do the evacuation cleanup during a periodic task.","commit_id":"a41fa32fb5c53bb8cda7735a2d80cf0c0997bef8"},{"author":{"_account_id":10135,"name":"Lee Yarwood","display_name":"Lee Yarwood","email":"lyarwood@redhat.com","username":"lyarwood"},"change_message_id":"1bb71495ed5a5cc82b356a02be7366b4836f030d","unresolved":false,"context_lines":[{"line_number":26,"context_line":"computes associated to `done` evacuation records. Forcing operators to"},{"line_number":27,"context_line":"restart the service allowing them to move to a `completed` state before"},{"line_number":28,"context_line":"the service can be forced up again."},{"line_number":29,"context_line":""},{"line_number":30,"context_line":"Closes-Bug: #1922053"},{"line_number":31,"context_line":"Change-Id: I95882ea28564a31a6b4f8b665de462774d84edfc"}],"source_content_type":"text/x-gerrit-commit-message","patch_set":1,"id":"f05dbcba_b62d7bda","line":29,"in_reply_to":"3b95d095_b795cdac","updated":"2021-04-01 09:30:28.000000000","message":"So I think we\u0027ve come to an agreement in IRC that this is going to be an acceptable thing to do alongside a periodic and other potential changes to ensure things are cleaned up correctly before we force up:\n\nhttp://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-nova.2021-04-01.log.html#t2021-04-01T08:28:24\n\nI\u0027ll respin this now to return 400 to avoid a microversion for now so this can be backported before introducing a new microversion in Xena and correctly returning 409.","commit_id":"a41fa32fb5c53bb8cda7735a2d80cf0c0997bef8"},{"author":{"_account_id":10135,"name":"Lee Yarwood","display_name":"Lee Yarwood","email":"lyarwood@redhat.com","username":"lyarwood"},"change_message_id":"1dd11aac30bbc6088a183d28de266e16f75b6281","unresolved":true,"context_lines":[{"line_number":26,"context_line":"computes associated to `done` evacuation records. Forcing operators to"},{"line_number":27,"context_line":"restart the service allowing them to move to a `completed` state before"},{"line_number":28,"context_line":"the service can be forced up again."},{"line_number":29,"context_line":""},{"line_number":30,"context_line":"Closes-Bug: #1922053"},{"line_number":31,"context_line":"Change-Id: I95882ea28564a31a6b4f8b665de462774d84edfc"}],"source_content_type":"text/x-gerrit-commit-message","patch_set":1,"id":"3b95d095_b795cdac","line":29,"in_reply_to":"d5f64d40_302d42ec","updated":"2021-04-01 08:44:44.000000000","message":"I\u0027m not sure I agree tbh. Documentation isn\u0027t enough to avoid issues like this IMHO and given how trivial the lookup is here it seems like a useful thing to do. What\u0027s the downside to codifying this?\n\nThis is specifically for the force down, evacuation and force up use case but yeah totally agree that your suggested flow is more likely and it\u0027s something I also wanted to look at addressing with a periodic. We actually discussed this at length yesterday during our downstream bug triage call.","commit_id":"a41fa32fb5c53bb8cda7735a2d80cf0c0997bef8"},{"author":{"_account_id":9708,"name":"Balazs Gibizer","display_name":"gibi","email":"gibizer@gmail.com","username":"gibi"},"change_message_id":"94345c99c0f19a6ed2a7980ebf663ed6af575a8a","unresolved":false,"context_lines":[{"line_number":26,"context_line":"computes associated to `done` evacuation records. Forcing operators to"},{"line_number":27,"context_line":"restart the service allowing them to move to a `completed` state before"},{"line_number":28,"context_line":"the service can be forced up again."},{"line_number":29,"context_line":""},{"line_number":30,"context_line":"Closes-Bug: #1922053"},{"line_number":31,"context_line":"Change-Id: I95882ea28564a31a6b4f8b665de462774d84edfc"}],"source_content_type":"text/x-gerrit-commit-message","patch_set":1,"id":"288141c6_b338c131","line":29,"in_reply_to":"f05dbcba_b62d7bda","updated":"2021-04-01 09:37:10.000000000","message":"Yepp, I\u0027m dropping my -1. I can accept that doing a consistency check at force up could be framed like: Before we take the responsibility back from the admin about this host, we check that what we get is consistent enough. \n\nI hope this check will not open up a further discussion about making force_down safer. As that is not the goal of that operation.","commit_id":"a41fa32fb5c53bb8cda7735a2d80cf0c0997bef8"},{"author":{"_account_id":4690,"name":"melanie witt","display_name":"melwitt","email":"melwittt@gmail.com","username":"melwitt"},"change_message_id":"34f53b2d9b1a085ec4f098a0961b3b453629d2cc","unresolved":true,"context_lines":[{"line_number":11,"context_line":"instance has been rebuilt on the new compute host. At present the"},{"line_number":12,"context_line":"migration record remains in this state until the original compute host"},{"line_number":13,"context_line":"is restarted and the service cleans up any leftovers of the instance"},{"line_number":14,"context_line":"before it is moved to a state of `completed`."},{"line_number":15,"context_line":""},{"line_number":16,"context_line":"Bug #1922053 details a use case where an operator might unintentionally"},{"line_number":17,"context_line":"forget to ensure the compute service is restarted before forcing the"}],"source_content_type":"text/x-gerrit-commit-message","patch_set":2,"id":"ef379ab7_33b5c11a","line":14,"updated":"2021-04-10 02:09:44.000000000","message":"Unrelated: lol that there\u0027s \u0027done\u0027 \u003d\u003e \u0027completed\u0027. The first time I read this, I was thinking, \"how is \u0027done\u0027 not an OK state for the cleanup? It\u0027s \u0027done\u0027. Oh, \u0027completed\u0027 is the actual state for all done.\"","commit_id":"bf89a23d91f01a29ba9b19bd0accf8de8c05f2f7"},{"author":{"_account_id":9708,"name":"Balazs Gibizer","display_name":"gibi","email":"gibizer@gmail.com","username":"gibi"},"change_message_id":"f8e1dabf5817fa0702f1375ff25199a0212f2b14","unresolved":true,"context_lines":[{"line_number":34,"context_line":"Finally, some additional functional tests have been updated to ensure"},{"line_number":35,"context_line":"they restart the source compute service of an evacuation before"},{"line_number":36,"context_line":"attempting to force up the service, ensuring any migration records are"},{"line_number":37,"context_line":"marked as completed."},{"line_number":38,"context_line":""},{"line_number":39,"context_line":"Closes-Bug: #1922053"},{"line_number":40,"context_line":"Change-Id: I95882ea28564a31a6b4f8b665de462774d84edfc"}],"source_content_type":"text/x-gerrit-commit-message","patch_set":2,"id":"d5f42202_533070dc","line":37,"updated":"2021-04-14 15:05:47.000000000","message":"extra thank you for that!","commit_id":"bf89a23d91f01a29ba9b19bd0accf8de8c05f2f7"}],"nova/api/openstack/compute/services.py":[{"author":{"_account_id":8556,"name":"Ghanshyam Maan","display_name":"Ghanshyam Maan","email":"gmaan.os14@gmail.com","username":"ghanshyam"},"change_message_id":"a525bc0bab37909c217d9569d74750113df01838","unresolved":true,"context_lines":[{"line_number":244,"context_line":"                    \"Ensure the compute service has been restarted, \""},{"line_number":245,"context_line":"                    \"allowing these records to move to `completed` before \""},{"line_number":246,"context_line":"                    \"retrying this request.\") % {\u0027host\u0027: hostname}"},{"line_number":247,"context_line":"            # TODO(lyarwood): Move to 409 HTTPConflict under a new microversion"},{"line_number":248,"context_line":"            raise webob.exc.HTTPBadRequest(explanation\u003dmsg)"},{"line_number":249,"context_line":""},{"line_number":250,"context_line":"    @wsgi.response(204)"}],"source_content_type":"text/x-python","patch_set":2,"id":"75743a1c_60832b9e","line":247,"range":{"start_line":247,"start_character":12,"end_line":247,"end_character":79},"updated":"2021-04-13 22:28:46.000000000","message":"we can fix this but mentioning \u0027retrying \u0027 in error msg also clarify it as \u0027fix-and-retry\u0027 error.","commit_id":"bf89a23d91f01a29ba9b19bd0accf8de8c05f2f7"},{"author":{"_account_id":4690,"name":"melanie witt","display_name":"melwitt","email":"melwittt@gmail.com","username":"melwitt"},"change_message_id":"34f53b2d9b1a085ec4f098a0961b3b453629d2cc","unresolved":true,"context_lines":[{"line_number":399,"context_line":"        return self._perform_action(req, id, body, actions)"},{"line_number":400,"context_line":""},{"line_number":401,"context_line":"    @wsgi.Controller.api_version(UUID_FOR_ID_MIN_VERSION)  # noqa F811"},{"line_number":402,"context_line":"    @wsgi.expected_errors((400, 404))"},{"line_number":403,"context_line":"    @validation.schema(services.service_update_v2_53, UUID_FOR_ID_MIN_VERSION)"},{"line_number":404,"context_line":"    def update(self, req, id, body):   # noqa"},{"line_number":405,"context_line":"        \"\"\"Perform service update"}],"source_content_type":"text/x-python","patch_set":2,"id":"badb7838_2f175a4f","line":402,"range":{"start_line":402,"start_character":27,"end_line":402,"end_character":30},"updated":"2021-04-10 02:09:44.000000000","message":"IIUC this is why we will start with 400.","commit_id":"bf89a23d91f01a29ba9b19bd0accf8de8c05f2f7"},{"author":{"_account_id":8556,"name":"Ghanshyam Maan","display_name":"Ghanshyam Maan","email":"gmaan.os14@gmail.com","username":"ghanshyam"},"change_message_id":"a525bc0bab37909c217d9569d74750113df01838","unresolved":true,"context_lines":[{"line_number":399,"context_line":"        return self._perform_action(req, id, body, actions)"},{"line_number":400,"context_line":""},{"line_number":401,"context_line":"    @wsgi.Controller.api_version(UUID_FOR_ID_MIN_VERSION)  # noqa F811"},{"line_number":402,"context_line":"    @wsgi.expected_errors((400, 404))"},{"line_number":403,"context_line":"    @validation.schema(services.service_update_v2_53, UUID_FOR_ID_MIN_VERSION)"},{"line_number":404,"context_line":"    def update(self, req, id, body):   # noqa"},{"line_number":405,"context_line":"        \"\"\"Perform service update"}],"source_content_type":"text/x-python","patch_set":2,"id":"342b149e_500550f7","line":402,"range":{"start_line":402,"start_character":27,"end_line":402,"end_character":30},"in_reply_to":"badb7838_2f175a4f","updated":"2021-04-13 22:28:46.000000000","message":"yeah, with 400 as existing return code, we would not need microversion bump.","commit_id":"bf89a23d91f01a29ba9b19bd0accf8de8c05f2f7"}]}
