)]}'
{"specs/ussuri/approved/nova-cyborg-interaction.rst":[{"author":{"_account_id":14070,"name":"Eric Fried","email":"openstack@fried.cc","username":"efried"},"change_message_id":"a72800dbdae082316c52b4fbb7cc6cf88fded97a","unresolved":false,"context_lines":[{"line_number":376,"context_line":"RCs to begin with, but may get standardized later. Such standardization"},{"line_number":377,"context_line":"requires changes to os-resource-classes."},{"line_number":378,"context_line":""},{"line_number":379,"context_line":"For end-to-end testing with tempest, Cyborg shall provide a fake driver"},{"line_number":380,"context_line":"which returns attach handles of type ``TEST_PCI``. The Nova virt driver"},{"line_number":381,"context_line":"should ignore such attach handles, and create VMs as if such ARQs did"},{"line_number":382,"context_line":"not exist."},{"line_number":383,"context_line":""},{"line_number":384,"context_line":"Upgrade impact"},{"line_number":385,"context_line":"--------------"}],"source_content_type":"text/x-rst","patch_set":3,"id":"3fa7e38b_c2ccae62","line":382,"range":{"start_line":379,"start_character":0,"end_line":382,"end_character":10},"updated":"2019-10-30 19:04:56.000000000","message":"Not sure I understand this. The TEST_PCI attach handles should only be produced when the fake driver is \"enabled\", which it should never be in a production environment. And in a test environment, the whole point of enabling the fake driver would be so that nova (or other cyborg consumers) can treat the fake attach handles like they would any other.\n\nAm I missing something?","commit_id":"6b7b82389837e705459ba88d20f17ed0759e57ce"},{"author":{"_account_id":14070,"name":"Eric Fried","email":"openstack@fried.cc","username":"efried"},"change_message_id":"1d1dfda16a74d0eca1c514689c45f67706bfa831","unresolved":false,"context_lines":[{"line_number":376,"context_line":"RCs to begin with, but may get standardized later. Such standardization"},{"line_number":377,"context_line":"requires changes to os-resource-classes."},{"line_number":378,"context_line":""},{"line_number":379,"context_line":"For end-to-end testing with tempest, Cyborg shall provide a fake driver"},{"line_number":380,"context_line":"which returns attach handles of type ``TEST_PCI``. The Nova virt driver"},{"line_number":381,"context_line":"should ignore such attach handles, and create VMs as if such ARQs did"},{"line_number":382,"context_line":"not exist."},{"line_number":383,"context_line":""},{"line_number":384,"context_line":"Upgrade impact"},{"line_number":385,"context_line":"--------------"}],"source_content_type":"text/x-rst","patch_set":3,"id":"3fa7e38b_926039d1","line":382,"range":{"start_line":379,"start_character":0,"end_line":382,"end_character":10},"in_reply_to":"3fa7e38b_36d4e01f","updated":"2019-10-31 13:22:20.000000000","message":"\u003e and composed them in the domain XML, the VM will not come up --\n \u003e libvirt/qemu will fail it.\n\nSo the idea is that, by ignoring these attach handles, the VM *will* come up, just without (these) devices?\n\nI\u0027m not sure how that\u0027s better.\n\nAt least if the BDFs were in the XML, you could potentially verify they ended up in there. If they\u0027re not, what good does it do that the VM came up?\n\n...Because presumably we\u0027re only doing this in a tempest test that sets up a device profile from nova in the first place, which we should only be doing when we\u0027re testing nova\u0027s use of cyborg. So it\u0027s not like these failed VMs will be a problem for other tests.\n\nWhat I\u0027m getting at is, we really ought to avoid having test-only logic in production code. IIUC what this is proposing is to inject something like:\n\n if handle[\u0027type\u0027] !\u003d \u0027TEST_PCI\u0027:\n     add_pci_dom_xml()\n\n...into the virt driver\u0027s spawn flow. And that should be avoided unless it\u0027s truly unavoidable.","commit_id":"6b7b82389837e705459ba88d20f17ed0759e57ce"},{"author":{"_account_id":14070,"name":"Eric Fried","email":"openstack@fried.cc","username":"efried"},"change_message_id":"e6a3e21b2db5d24ab2b31bd49d756f627cbf59c1","unresolved":false,"context_lines":[{"line_number":376,"context_line":"RCs to begin with, but may get standardized later. Such standardization"},{"line_number":377,"context_line":"requires changes to os-resource-classes."},{"line_number":378,"context_line":""},{"line_number":379,"context_line":"For end-to-end testing with tempest, Cyborg shall provide a fake driver"},{"line_number":380,"context_line":"which returns attach handles of type ``TEST_PCI``. The Nova virt driver"},{"line_number":381,"context_line":"should ignore such attach handles, and create VMs as if such ARQs did"},{"line_number":382,"context_line":"not exist."},{"line_number":383,"context_line":""},{"line_number":384,"context_line":"Upgrade impact"},{"line_number":385,"context_line":"--------------"}],"source_content_type":"text/x-rst","patch_set":3,"id":"3fa7e38b_ced92256","line":382,"range":{"start_line":379,"start_character":0,"end_line":382,"end_character":10},"in_reply_to":"3fa7e38b_63e7ea13","updated":"2019-11-01 13:14:53.000000000","message":"\u003e if ... arqs[0][\u0027attach_handle_type\u0027] \u003d\u003d \u0027PCI\u0027:\n\nOkay, this makes me feel better. I\u0027m convinced, thanks for patiently explaining.","commit_id":"6b7b82389837e705459ba88d20f17ed0759e57ce"},{"author":{"_account_id":21672,"name":"Sundar Nadathur","email":"sundar.nadathur@intel.com","username":"nsundar"},"change_message_id":"c2a27122eeead9d1bfaa55d00504b9aa4d2bcf62","unresolved":false,"context_lines":[{"line_number":376,"context_line":"RCs to begin with, but may get standardized later. Such standardization"},{"line_number":377,"context_line":"requires changes to os-resource-classes."},{"line_number":378,"context_line":""},{"line_number":379,"context_line":"For end-to-end testing with tempest, Cyborg shall provide a fake driver"},{"line_number":380,"context_line":"which returns attach handles of type ``TEST_PCI``. The Nova virt driver"},{"line_number":381,"context_line":"should ignore such attach handles, and create VMs as if such ARQs did"},{"line_number":382,"context_line":"not exist."},{"line_number":383,"context_line":""},{"line_number":384,"context_line":"Upgrade impact"},{"line_number":385,"context_line":"--------------"}],"source_content_type":"text/x-rst","patch_set":3,"id":"3fa7e38b_63e7ea13","line":382,"range":{"start_line":379,"start_character":0,"end_line":382,"end_character":10},"in_reply_to":"3fa7e38b_926039d1","updated":"2019-11-01 05:45:07.000000000","message":"\u003e At least if the BDFs were in the XML, you could potentially verify they ended up in there. \n\nHow? There is no VM or xml after libvirt/qemu crashes and burns. If you are proposing to scrape the n-cpu logs, that\u0027s a sketchy idea. \n\nAnd how do you know why the VM bringup failed? It could be because libvirt/qemu failed it, or something else went wrong in some code path, which is a real test failure. Short of log-scraping, I don\u0027t see how we\u0027ll determine that.\n\n\u003e If they\u0027re not, what good does it do that the VM came up?\n\nIf the VM did spawn successfully, at least we know the code flow is good till the point where PCI devices need to be injected into the VM\u0027s XML. That includes the logic in the Nova API server, conductor, scheduler, compute manager and the parts of the virt driver till creation of guest config. It exercises all Cyborg APIs that are involved in VM creation: get device profile request groups, create/bind ARQs, wait for Cyborg\u0027s notification and query Cyborg to get the ARQs.\n\n\u003e ...Because presumably we\u0027re only doing this in a tempest test that\n\u003e sets up a device profile from nova in the first place, which we\n\u003e should only be doing when we\u0027re testing nova\u0027s use of cyborg. So\n\u003e it\u0027s not like these failed VMs will be a problem for other tests.\n\nThat is a bit beside the point because VM spawn failure is no good for testing the right code flow either.\n\n\u003e What I\u0027m getting at is, we really ought to avoid having test-only logic in production code. \n\nWe are not doing that.\n\n\u003e IIUC what this is proposing is to inject something like:\n\n\u003e if handle[\u0027type\u0027] !\u003d \u0027TEST_PCI\u0027:\n\u003e add_pci_dom_xml()\n\nReferencing the code [1] again:\n\n  if ... arqs[0][\u0027attach_handle_type\u0027] \u003d\u003d \u0027PCI\u0027:\n\nThis is validating the right attach handle type for production code, not checking whether this is a test case.\n\nWe plan to use the same tempest plugin in third-party CI, where Cyborg will return real BDFs. With no change in \u0027production code\u0027, that will:\nA. exercise the same code paths, but \nB. can do additional validation like inspecting the xml, and\nC. can do additional test cases like programming.\n\n[1] https://review.opendev.org/gitweb?p\u003dopenstack/nova.git;f\u003dnova/virt/libvirt/driver.py;hb\u003drefs/changes/45/631245/36#l5738","commit_id":"6b7b82389837e705459ba88d20f17ed0759e57ce"},{"author":{"_account_id":21672,"name":"Sundar Nadathur","email":"sundar.nadathur@intel.com","username":"nsundar"},"change_message_id":"132c53ce20ead5ff498beefd2279ae73d7c2bc68","unresolved":false,"context_lines":[{"line_number":376,"context_line":"RCs to begin with, but may get standardized later. Such standardization"},{"line_number":377,"context_line":"requires changes to os-resource-classes."},{"line_number":378,"context_line":""},{"line_number":379,"context_line":"For end-to-end testing with tempest, Cyborg shall provide a fake driver"},{"line_number":380,"context_line":"which returns attach handles of type ``TEST_PCI``. The Nova virt driver"},{"line_number":381,"context_line":"should ignore such attach handles, and create VMs as if such ARQs did"},{"line_number":382,"context_line":"not exist."},{"line_number":383,"context_line":""},{"line_number":384,"context_line":"Upgrade impact"},{"line_number":385,"context_line":"--------------"}],"source_content_type":"text/x-rst","patch_set":3,"id":"3fa7e38b_36d4e01f","line":382,"range":{"start_line":379,"start_character":0,"end_line":382,"end_character":10},"in_reply_to":"3fa7e38b_c2ccae62","updated":"2019-10-31 01:37:09.000000000","message":"\u003e The TEST_PCI attach handles should only\n \u003e be produced when the fake driver is \"enabled\", which it should\n \u003e never be in a production environment. \n\nYes.\n\n \u003e And in a test environment,\n \u003e the whole point of enabling the fake driver would be so that nova\n \u003e (or other cyborg consumers) can treat the fake attach handles like\n \u003e they would any other.\n\nThe fake driver returns BDFs which, by definition, don\u0027t correspond to real devices. If the virt driver \u0027treated them like any other,\u0027 and composed them in the domain XML, the VM will not come up -- libvirt/qemu will fail it. So, the tempest env can test the code flow from the start all the way through to that point where the virt driver checks the bound ARQs for attach handle type [1]. IOW, all code except that which produces VM\u0027s domain XML and creates the instance via libvirt, which is not Cyborg-specific.\n\n[1] https://review.opendev.org/gitweb?p\u003dopenstack/nova.git;f\u003dnova/virt/libvirt/driver.py;hb\u003drefs/changes/45/631245/36#l5738","commit_id":"6b7b82389837e705459ba88d20f17ed0759e57ce"},{"author":{"_account_id":11604,"name":"sean mooney","email":"smooney@redhat.com","username":"sean-k-mooney"},"change_message_id":"2e66eff2446004499d0e80e35dc33ad2e8bc57fe","unresolved":false,"context_lines":[{"line_number":91,"context_line":"Ideally, there should be a single way for the operator to identify which PCI"},{"line_number":92,"context_line":"devices should be claimed by Nova and which by Cyborg. Until that is figured"},{"line_number":93,"context_line":"out, the operator shall use Cyborg\u0027s configuration file to specify which"},{"line_number":94,"context_line":"Cyborg drivers are enabled. Since each driver claims specific PCI IDs, the"},{"line_number":95,"context_line":"operator can and must ensure that none of these PCI IDs are included in Nova\u0027s"},{"line_number":96,"context_line":"PCI whitelist."},{"line_number":97,"context_line":""}],"source_content_type":"text/x-rst","patch_set":4,"id":"3fa7e38b_843b3099","line":94,"range":{"start_line":94,"start_character":61,"end_line":94,"end_character":68},"updated":"2019-11-20 15:12:56.000000000","message":"nit: pci ids is a little nebulous \nit could refer to the pci adresses or it could refer to the vendor and product ids.","commit_id":"6d5d1e463f024099f1380ea408a4b36767f1b2c1"},{"author":{"_account_id":7166,"name":"Sylvain Bauza","email":"sbauza@redhat.com","username":"sbauza"},"change_message_id":"677a8dd539a4937481657ba2a96a00bd2ab9b1a2","unresolved":false,"context_lines":[{"line_number":91,"context_line":"Ideally, there should be a single way for the operator to identify which PCI"},{"line_number":92,"context_line":"devices should be claimed by Nova and which by Cyborg. Until that is figured"},{"line_number":93,"context_line":"out, the operator shall use Cyborg\u0027s configuration file to specify which"},{"line_number":94,"context_line":"Cyborg drivers are enabled. Since each driver claims specific PCI IDs, the"},{"line_number":95,"context_line":"operator can and must ensure that none of these PCI IDs are included in Nova\u0027s"},{"line_number":96,"context_line":"PCI whitelist."},{"line_number":97,"context_line":""}],"source_content_type":"text/x-rst","patch_set":4,"id":"3fa7e38b_f722bf00","line":94,"range":{"start_line":94,"start_character":61,"end_line":94,"end_character":68},"in_reply_to":"3fa7e38b_316879b8","updated":"2019-11-22 08:46:18.000000000","message":"In Nova, both are accepted (PCI addresses vs. vendor/product IDs tuple)\nHonestly, I personnally don\u0027t care what Cyborg will ask for configuring the PCI devices, I just want to tell that maybe Cyborg should do the same than Nova if operators have to configure both services.","commit_id":"6d5d1e463f024099f1380ea408a4b36767f1b2c1"},{"author":{"_account_id":11604,"name":"sean mooney","email":"smooney@redhat.com","username":"sean-k-mooney"},"change_message_id":"56fafa578e55cd0ed0a4e5b96c8b1b273b727fb5","unresolved":false,"context_lines":[{"line_number":91,"context_line":"Ideally, there should be a single way for the operator to identify which PCI"},{"line_number":92,"context_line":"devices should be claimed by Nova and which by Cyborg. Until that is figured"},{"line_number":93,"context_line":"out, the operator shall use Cyborg\u0027s configuration file to specify which"},{"line_number":94,"context_line":"Cyborg drivers are enabled. Since each driver claims specific PCI IDs, the"},{"line_number":95,"context_line":"operator can and must ensure that none of these PCI IDs are included in Nova\u0027s"},{"line_number":96,"context_line":"PCI whitelist."},{"line_number":97,"context_line":""}],"source_content_type":"text/x-rst","patch_set":4,"id":"1f493fa4_480a5726","line":94,"range":{"start_line":94,"start_character":61,"end_line":94,"end_character":68},"in_reply_to":"3fa7e38b_5ff3c19f","updated":"2020-04-22 05:28:14.000000000","message":"i guess i dont care enough to argure that pci id is wrogn so ok.\n\ni also agree that cyborg will need a way to filter deivce in the future. the enable or disable driver approach is too course grained in the long run.","commit_id":"6d5d1e463f024099f1380ea408a4b36767f1b2c1"},{"author":{"_account_id":21672,"name":"Sundar Nadathur","email":"sundar.nadathur@intel.com","username":"nsundar"},"change_message_id":"0ba853d6633789e9b3860da974138e451f163407","unresolved":false,"context_lines":[{"line_number":91,"context_line":"Ideally, there should be a single way for the operator to identify which PCI"},{"line_number":92,"context_line":"devices should be claimed by Nova and which by Cyborg. Until that is figured"},{"line_number":93,"context_line":"out, the operator shall use Cyborg\u0027s configuration file to specify which"},{"line_number":94,"context_line":"Cyborg drivers are enabled. Since each driver claims specific PCI IDs, the"},{"line_number":95,"context_line":"operator can and must ensure that none of these PCI IDs are included in Nova\u0027s"},{"line_number":96,"context_line":"PCI whitelist."},{"line_number":97,"context_line":""}],"source_content_type":"text/x-rst","patch_set":4,"id":"3fa7e38b_5ff3c19f","line":94,"range":{"start_line":94,"start_character":61,"end_line":94,"end_character":68},"in_reply_to":"3fa7e38b_83a69cbf","updated":"2020-01-06 23:50:16.000000000","message":"+1","commit_id":"6d5d1e463f024099f1380ea408a4b36767f1b2c1"},{"author":{"_account_id":21672,"name":"Sundar Nadathur","email":"sundar.nadathur@intel.com","username":"nsundar"},"change_message_id":"e515084f10088110bd1c0f08a631cde9dc2290e1","unresolved":false,"context_lines":[{"line_number":91,"context_line":"Ideally, there should be a single way for the operator to identify which PCI"},{"line_number":92,"context_line":"devices should be claimed by Nova and which by Cyborg. Until that is figured"},{"line_number":93,"context_line":"out, the operator shall use Cyborg\u0027s configuration file to specify which"},{"line_number":94,"context_line":"Cyborg drivers are enabled. Since each driver claims specific PCI IDs, the"},{"line_number":95,"context_line":"operator can and must ensure that none of these PCI IDs are included in Nova\u0027s"},{"line_number":96,"context_line":"PCI whitelist."},{"line_number":97,"context_line":""}],"source_content_type":"text/x-rst","patch_set":4,"id":"3fa7e38b_316879b8","line":94,"range":{"start_line":94,"start_character":61,"end_line":94,"end_character":68},"in_reply_to":"3fa7e38b_843b3099","updated":"2019-11-20 16:41:40.000000000","message":"True. The statement is true for both interpretations. However, I think the vendor/product ID is most relevant to the operator, because s/he will know them _before_ installing the accelerator card, thus avoiding conflicts proactively.","commit_id":"6d5d1e463f024099f1380ea408a4b36767f1b2c1"},{"author":{"_account_id":21672,"name":"Sundar Nadathur","email":"sundar.nadathur@intel.com","username":"nsundar"},"change_message_id":"a2a4cedb19873621c527d291075c04c80bc02a9d","unresolved":false,"context_lines":[{"line_number":91,"context_line":"Ideally, there should be a single way for the operator to identify which PCI"},{"line_number":92,"context_line":"devices should be claimed by Nova and which by Cyborg. Until that is figured"},{"line_number":93,"context_line":"out, the operator shall use Cyborg\u0027s configuration file to specify which"},{"line_number":94,"context_line":"Cyborg drivers are enabled. Since each driver claims specific PCI IDs, the"},{"line_number":95,"context_line":"operator can and must ensure that none of these PCI IDs are included in Nova\u0027s"},{"line_number":96,"context_line":"PCI whitelist."},{"line_number":97,"context_line":""}],"source_content_type":"text/x-rst","patch_set":4,"id":"3fa7e38b_fa9a94c2","line":94,"range":{"start_line":94,"start_character":61,"end_line":94,"end_character":68},"in_reply_to":"3fa7e38b_f722bf00","updated":"2019-11-22 16:13:59.000000000","message":"The operator enables specific drivers in cyborg.conf, like this:\n\n [agent]\n enabled_drivers \u003d fake_driver\n\nThe docs will say what PCI vendor/product IDs are claimed by each driver. So, the operator can decide whether there is a conflict before he installs the cards or enables the driver.\n\nWe are investigating APIs to enable or disable a specific device or a set of devices, possibly based on PCI vendor/device IDs, BDFs, host IDs/names or some combination. This is WIP for Ussuri. That would be along the lines of what you are saying.","commit_id":"6d5d1e463f024099f1380ea408a4b36767f1b2c1"},{"author":{"_account_id":14070,"name":"Eric Fried","email":"openstack@fried.cc","username":"efried"},"change_message_id":"4828e2313b6d6ca3bb81427d7314d9c495d1b81a","unresolved":false,"context_lines":[{"line_number":91,"context_line":"Ideally, there should be a single way for the operator to identify which PCI"},{"line_number":92,"context_line":"devices should be claimed by Nova and which by Cyborg. Until that is figured"},{"line_number":93,"context_line":"out, the operator shall use Cyborg\u0027s configuration file to specify which"},{"line_number":94,"context_line":"Cyborg drivers are enabled. Since each driver claims specific PCI IDs, the"},{"line_number":95,"context_line":"operator can and must ensure that none of these PCI IDs are included in Nova\u0027s"},{"line_number":96,"context_line":"PCI whitelist."},{"line_number":97,"context_line":""}],"source_content_type":"text/x-rst","patch_set":4,"id":"3fa7e38b_83a69cbf","line":94,"range":{"start_line":94,"start_character":61,"end_line":94,"end_character":68},"in_reply_to":"3fa7e38b_fa9a94c2","updated":"2019-11-22 17:38:39.000000000","message":"IIUC:\n\nIn cyborg, you cast a net around a bunch of devices by enabling a driver that knows how to deal with those devices. In Ussuri, that will be \"all or nothing\" -- either the driver is enabled and covers all those devices, or the driver is disabled and covers none of them.\n\nIn nova, PCI whitelisting.\n\nAnd the point is, the deployer needs to make sure those sets don\u0027t overlap.\n\nIn the future, there may be a mechanism whereby you could refine the cyborg set more granularly; but that\u0027s an exercise for later. (Actually the vision is to use unified provider config YAML that both nova and cyborg would read to find out who gets to own a particular device. Die [pci]passthrough_whitelist, die.)\n\n(BTW, I\u0027m good with the nebulousness of \"PCI ID\". It saves us having to go into specifics of vendor/product/bdf/etc which are not salient here.)","commit_id":"6d5d1e463f024099f1380ea408a4b36767f1b2c1"}]}
