)]}'
{"/PATCHSET_LEVEL":[{"author":{"_account_id":22443,"name":"Mike Lowe","email":"jomlowe@iu.edu","username":"jomlowe"},"change_message_id":"0ec33b535c6425005c79c0113ade8e0e5c72abb7","unresolved":false,"context_lines":[],"source_content_type":"","patch_set":3,"id":"a7080549_8c04ff77","updated":"2024-07-27 01:44:13.000000000","message":"Looking at this line the default is devtype\u003dNone\nhttps://opendev.org/openstack/nova/src/commit/7a7427691e0bd4818bb7a2c5f5371e0244addbbb/nova/virt/libvirt/guest.py#L416\nWhen devtype\u003d\u003dNone then all devices are appended even MDEV devices\nhttps://opendev.org/openstack/nova/src/commit/7a7427691e0bd4818bb7a2c5f5371e0244addbbb/nova/virt/libvirt/guest.py#L448\nWhen the guest.get_device_by_alias is called it\u0027s not called with devtype so the default of None comes through so all device types must be parsed\nhttps://opendev.org/openstack/nova/src/commit/7a7427691e0bd4818bb7a2c5f5371e0244addbbb/nova/virt/libvirt/driver.py#L2825","commit_id":"5a92956debaf001bcca8b2dd26d6748bfd7ecc24"},{"author":{"_account_id":782,"name":"John Garbutt","email":"john@johngarbutt.com","username":"johngarbutt"},"change_message_id":"432a32d0778a6267a180f8ccdac697ea16cb24cc","unresolved":false,"context_lines":[],"source_content_type":"","patch_set":3,"id":"b80a4527_04f684c2","updated":"2024-10-17 16:16:56.000000000","message":"Stig just pointed me at this one, sounds nasty for mdev users.\n\nA functional test would be good here, to stop this getting regressed again. It looks like we need a version of this test:\nhttps://github.com/openstack/nova/blob/ed2bf3699ddc360fb646b895560fa9dbcfd6beba/nova/tests/functional/libvirt/test_vgpu.py#L236\nMerged with a bit of this one for the cinder attach:\nhttps://github.com/openstack/nova/blob/ed2bf3699ddc360fb646b895560fa9dbcfd6beba/nova/tests/functional/test_conf_max_attach_disk_devices.py#L81\n(assuming we generate enough libvirt xml to fail in a similar way)\n\nI don\u0027t have the time to dig any further right this second, I am afraid.","commit_id":"5a92956debaf001bcca8b2dd26d6748bfd7ecc24"},{"author":{"_account_id":11604,"name":"sean mooney","email":"smooney@redhat.com","username":"sean-k-mooney"},"change_message_id":"a7406b58c88bc3a236e9fc4b935e6c8e0d8dccc0","unresolved":false,"context_lines":[],"source_content_type":"","patch_set":3,"id":"121e53c1_e64d78a3","updated":"2024-10-21 17:56:40.000000000","message":"as i noted on upstream irc after this was abandonded i do agree we should fix the bug. if the reporter does not want to follow our normal process that is fine. the comunity can pick up there change and resolve the usecase.\n\nhttps://github.com/openstack/nova/blob/ed2bf3699ddc360fb646b895560fa9dbcfd6beba/nova/tests/functional/regressions/README.rst#L5\n\nIt has not been picked up yet because we have been busy with other issues (we we dealing with a 2 CVEs when this was abandoned) but this is still a valid bug to fix.","commit_id":"5a92956debaf001bcca8b2dd26d6748bfd7ecc24"},{"author":{"_account_id":11604,"name":"sean mooney","email":"smooney@redhat.com","username":"sean-k-mooney"},"change_message_id":"4f856f22d438cc5c211e0076891251c50cb3bdd2","unresolved":false,"context_lines":[],"source_content_type":"","patch_set":3,"id":"85404768_9d264453","updated":"2024-10-21 19:57:12.000000000","message":"https://review.opendev.org/q/topic:%22bug/2074219%22 is a fix for the actual bug in case this is of use to people.\n\nthat is what i was asking for in my previous review.\n\na reproducer regression test and a follow up patch to show it works end to end testing the actual case that failed.","commit_id":"5a92956debaf001bcca8b2dd26d6748bfd7ecc24"},{"author":{"_account_id":11604,"name":"sean mooney","email":"smooney@redhat.com","username":"sean-k-mooney"},"change_message_id":"321c7e490f940ee5a4e97fb5f3cffcc13bf2c79d","unresolved":false,"context_lines":[],"source_content_type":"","patch_set":3,"id":"c73d154a_51c8150f","updated":"2024-07-27 00:14:34.000000000","message":"we do not support detaching or attachign medevs form a vm.\n\nso its not entrily clear how you are triggering this bug?","commit_id":"5a92956debaf001bcca8b2dd26d6748bfd7ecc24"},{"author":{"_account_id":11604,"name":"sean mooney","email":"smooney@redhat.com","username":"sean-k-mooney"},"change_message_id":"bb1182b2339913ae6538ef5ed4bcb952aae4f307","unresolved":true,"context_lines":[],"source_content_type":"","patch_set":3,"id":"966288b6_a1cebae7","in_reply_to":"1708efc6_875cf6e8","updated":"2024-07-28 09:27:12.000000000","message":"actual the bug is not because we are not specifying the device type\n\nIt is because we are not checking if the device has an alias attribute here\n\nhttps://github.com/openstack/nova/blob/7a7427691e0bd4818bb7a2c5f5371e0244addbbb/nova/virt/libvirt/guest.py#L419\n\nso the original bug should have been fixed by also fixing get_device_by_alias\n\n\nthe difference between this new bug and https://bugs.launchpad.net/nova/+bug/1942345 is PF devices are meant to be detachable and were planned to be enhanced to support alias-based detaching where as mdevs are not. so we explicitly did not add the alias field to them when we were adding supprot for alias based detach.\n\nanyway, again the reason I\u0027m asking so many questions is that you did not follow our normal bug progress fo first submitting a unit/functional test to reproduce the bug and then a second patch on top that fixes the problem and updating the reproducer to prevent regressions.\n\nthe unit test you have in this patch does not provide enough test coverage as you never demonstrated the original error and as a result, is also insufficient to demonstrate you fixed the bug.\n\nyou don\u0027t see that in the patch you are referencing because that was part of a series of patches to workaround a backwards incompatible change in qemu that prevented libvirt/nova from retrying a detach if it timed out so we could see the regression in our ci testing.","commit_id":"5a92956debaf001bcca8b2dd26d6748bfd7ecc24"},{"author":{"_account_id":22443,"name":"Mike Lowe","email":"jomlowe@iu.edu","username":"jomlowe"},"change_message_id":"f1e9b13d2b0231fdf36420f670cc47001816fe70","unresolved":true,"context_lines":[],"source_content_type":"","patch_set":3,"id":"b434aec9_56c8ef5d","in_reply_to":"88589553_d0b11e32","updated":"2024-07-27 18:46:37.000000000","message":"The intent was not carried through to the implementation. In the traceback MDEV devices are very clearly parsed and throw exceptions causing the detachment workflow to fail creating inconsistent block device mapping states. If you don\u0027t want to detach a device type explicitly filter for that device, relying on missing attributes to throw exceptions is brittle and not good practice as evidenced by two different volume detachment bugs triggered by trying to parse unrelated device\u0027s alias attributes. Again explicit is better than implicit, throw explicit exceptions when bad things are done instead of relying on generic missing attribute exceptions.","commit_id":"5a92956debaf001bcca8b2dd26d6748bfd7ecc24"},{"author":{"_account_id":11604,"name":"sean mooney","email":"smooney@redhat.com","username":"sean-k-mooney"},"change_message_id":"1754c2e866eff6bac2fc811ea6ce2cbf7d82d7d4","unresolved":true,"context_lines":[],"source_content_type":"","patch_set":3,"id":"88589553_d0b11e32","in_reply_to":"9be272c3_daa39f29","updated":"2024-07-27 14:01:24.000000000","message":"the way we implemented the aials support because of upgrade reasons was to first try detaching via the aias and then fallback to the older method.\n\n\nwe dont currently support attaching or detaching mdev at all\n\nand we dont supprot using mdevs for network devices or volumes.\n\nyou are correct that an mdev can have an asias but we intentioally chose not to set it or parse it on any device we do not supprot attaching or detaching.","commit_id":"5a92956debaf001bcca8b2dd26d6748bfd7ecc24"},{"author":{"_account_id":22443,"name":"Mike Lowe","email":"jomlowe@iu.edu","username":"jomlowe"},"change_message_id":"67440bfd4970cf8b6d3db7830d59add055d68cd4","unresolved":false,"context_lines":[],"source_content_type":"","patch_set":3,"id":"9be272c3_daa39f29","in_reply_to":"a7080549_8c04ff77","updated":"2024-07-27 01:50:27.000000000","message":"MDEV devices do have an alias tag in the libvirt xml so it\u0027s safer to set this in the event it\u0027s parsed regardless of how the device is or is not operated on as clearly some parts of the code expect devices coming back from libvirt to have an alias attribute.","commit_id":"5a92956debaf001bcca8b2dd26d6748bfd7ecc24"},{"author":{"_account_id":11604,"name":"sean mooney","email":"smooney@redhat.com","username":"sean-k-mooney"},"change_message_id":"bb1182b2339913ae6538ef5ed4bcb952aae4f307","unresolved":true,"context_lines":[],"source_content_type":"","patch_set":3,"id":"cbc99957_2e763717","in_reply_to":"b434aec9_56c8ef5d","updated":"2024-07-28 09:27:12.000000000","message":"we are not relying on the missing attribute to filter out thet devices\n\nwe used to have more explicit filtering and it was simplified when teh alias support was added but there was never any intent to rely on the attribute error.\n\nany attempt to rely on something like that would not pass code review.\nThis is simply a case of missing test coverage and not having mdev or PF available ohn the systems that were used to implement and review the alias series originally.","commit_id":"5a92956debaf001bcca8b2dd26d6748bfd7ecc24"},{"author":{"_account_id":11604,"name":"sean mooney","email":"smooney@redhat.com","username":"sean-k-mooney"},"change_message_id":"a7406b58c88bc3a236e9fc4b935e6c8e0d8dccc0","unresolved":false,"context_lines":[],"source_content_type":"","patch_set":3,"id":"615f1646_84ed22d7","in_reply_to":"b80a4527_04f684c2","updated":"2024-10-21 17:56:40.000000000","message":"yep that is more or less what was beign asked for.\n\nspecifically, i asked for a reproducer test case showing it was broken(ideally as a functional testing https://github.com/openstack/nova/tree/master/nova/tests/functional/regressions) as an initial patch and then the fix.\n\nSince we have had a similar bug in the past, it should be noted that our current test coverage is insufficient, so it would be prudent to add more comprehensive testing via a functional regression test. \n\nsimply providing a unit test for the config.py change does not test the end to end usecasue dscribed in the bug.\ni.e creating vm with both a vgpu and a cinder volume and showing you can detach the volume.\n\nthat is what I originally asked for, demonstrate the workflow is broken and that your tests fix it using our functional test suite which required no hardware to execute.","commit_id":"5a92956debaf001bcca8b2dd26d6748bfd7ecc24"},{"author":{"_account_id":22443,"name":"Mike Lowe","email":"jomlowe@iu.edu","username":"jomlowe"},"change_message_id":"0f44f2550b50061fab536f13b2b451a0ad7fd575","unresolved":true,"context_lines":[],"source_content_type":"","patch_set":3,"id":"1708efc6_875cf6e8","in_reply_to":"b9d804ce_0714dd9e","updated":"2024-07-27 14:37:59.000000000","message":"This is only with MDEV devices which is why it\u0027s not found in the ci. \n\nIt is nearly identical to bug https://bugs.launchpad.net/nova/+bug/1942345 because at this line https://opendev.org/openstack/nova/src/commit/7a7427691e0bd4818bb7a2c5f5371e0244addbbb/nova/virt/libvirt/driver.py#L2825 the device type isn\u0027t specified so all devices (including MDEV which causes my bug and PCI which causes bug 1942345) are parsed instead of the intended volume device type. This patch is a very slight variant of https://review.opendev.org/c/openstack/nova/+/806943 which didn\u0027t seem to cause any concern.","commit_id":"5a92956debaf001bcca8b2dd26d6748bfd7ecc24"},{"author":{"_account_id":22443,"name":"Mike Lowe","email":"jomlowe@iu.edu","username":"jomlowe"},"change_message_id":"0ec33b535c6425005c79c0113ade8e0e5c72abb7","unresolved":false,"context_lines":[],"source_content_type":"","patch_set":3,"id":"ffd5929e_c82df152","in_reply_to":"c73d154a_51c8150f","updated":"2024-07-27 01:44:13.000000000","message":"Detachment of any and all devices including volumes fails due to this bug. We replicated it with dozens of instances on our production cloud following upgrade to Caracal. This patch has been applied to our cloud and is running in production. We\u0027re now able to detach volumes.  Please review the bug for a stack trace showing volume and NOT mdev removal failing.","commit_id":"5a92956debaf001bcca8b2dd26d6748bfd7ecc24"},{"author":{"_account_id":11604,"name":"sean mooney","email":"smooney@redhat.com","username":"sean-k-mooney"},"change_message_id":"1754c2e866eff6bac2fc811ea6ce2cbf7d82d7d4","unresolved":true,"context_lines":[],"source_content_type":"","patch_set":3,"id":"b9d804ce_0714dd9e","in_reply_to":"ffd5929e_c82df152","updated":"2024-07-27 14:01:24.000000000","message":"are you sure this is not just the fallback.\n\nwe have port and volume attach/detach testign runing on all patches and it is still fucntional as far as our ci can tell.\n\nthe way that the alais feature was added  was to first try detachign via the aials, catch thet failure and then proceed with the older approch so that we could support both old instances without aiasis set and new ones.\n\nim not deiging what you are seeing but im trying to excpress that we have testing of this with real voluems using both ceph and lvm volumes exported over iscsi.\n\nthose volume tests are still passing so im trying to under stand exactly in what situation you are seing the failure.\n\ncan you update the bug with details of the cidner backend your using and what version fo OpenStack is used.  I think you said carical but it woudl be good to confirm.\n\nbefore proceeding to a fix I would like to demonstrate the current issues with a functional test reproducer if possible. for example does this only happen on systems that have mdevs created? that might explain why we dont see thi today in our ci.\n\nwithout a repdoucer we cant ensure that this does not regress.","commit_id":"5a92956debaf001bcca8b2dd26d6748bfd7ecc24"}]}
