)]}' {"specs/xena/nvme-agent.rst":[{"author":{"_account_id":5314,"name":"Brian Rosmaita","email":"rosmaita.fossdev@gmail.com","username":"brian-rosmaita"},"change_message_id":"cb5cbb3abeb7917f318afb1e2f43263f4219f4dd","unresolved":true,"context_lines":[{"line_number":151,"context_line":"Performance Impact"},{"line_number":152,"context_line":"------------------"},{"line_number":153,"context_line":""},{"line_number":154,"context_line":"None"},{"line_number":155,"context_line":""},{"line_number":156,"context_line":"Other deployer impact"},{"line_number":157,"context_line":"---------------------"}],"source_content_type":"text/x-rst","patch_set":3,"id":"e315ba09_b70c0bff","line":154,"range":{"start_line":154,"start_character":0,"end_line":154,"end_character":4},"updated":"2021-06-23 02:47:19.000000000","message":"Well, you are going to have an extra process running on the compute node. Might be worth saying how often the periodic task will run, and to describe its characteristics (from lines 89-94, it sounds like it will spend most of its waking time waiting for a response from the provisioner, it won\u0027t do any compute-intensive work while its awake, and will only be sending/receiving small amounts of data across the network).","commit_id":"b50276653f34938d615c084a9dc57c8615d0e51f"},{"author":{"_account_id":16721,"name":"Zohar Mamedov","email":"zohar.cloud@gmail.com","username":"zohar"},"change_message_id":"245bb60fc8d9fd48c64fc8b97f849d3e312e433f","unresolved":true,"context_lines":[{"line_number":151,"context_line":"Performance Impact"},{"line_number":152,"context_line":"------------------"},{"line_number":153,"context_line":""},{"line_number":154,"context_line":"None"},{"line_number":155,"context_line":""},{"line_number":156,"context_line":"Other deployer impact"},{"line_number":157,"context_line":"---------------------"}],"source_content_type":"text/x-rst","patch_set":3,"id":"6fcf2679_fb01170e","line":154,"range":{"start_line":154,"start_character":0,"end_line":154,"end_character":4},"in_reply_to":"e315ba09_b70c0bff","updated":"2021-06-23 08:48:36.000000000","message":"I think I quickly overlooked this secion, assuming default mode of operation (operator does not configure the agent to run) then the performance impact really is zero. However, as per your suggestion, I am adding the expected compute and network performance impacts for when the agent is configured to run. Thank you!","commit_id":"b50276653f34938d615c084a9dc57c8615d0e51f"},{"author":{"_account_id":27615,"name":"Rajat Dhasmana","email":"rajatdhasmana@gmail.com","username":"whoami-rajat"},"change_message_id":"d46c723a9ddaabb32cc93425f3f8c84bf30a9bcf","unresolved":true,"context_lines":[{"line_number":31,"context_line":"For target-side volume replication (traditional approach), it is the storage"},{"line_number":32,"context_line":"backend that takes care of monitoring and self healing."},{"line_number":33,"context_line":"The NVMe + MDRAID approach moves the data replication responsibility from the"},{"line_number":34,"context_line":"storage backend to the consuming initiator (compute node)."},{"line_number":35,"context_line":""},{"line_number":36,"context_line":"Currently there\u0027s no mechanism to monitor and heal these replicated volumes."},{"line_number":37,"context_line":"We cannot do it only on the Cinder driver side because currently there is no"}],"source_content_type":"text/x-rst","patch_set":4,"id":"1b4b2610_458ba88e","line":34,"range":{"start_line":34,"start_character":44,"end_line":34,"end_character":56},"updated":"2021-06-23 14:11:12.000000000","message":"This can even be controller node if glance is using this agent for glance cinder usecase but we can leave that for now.","commit_id":"1dae425b7ab769464082b11d507d2a761df455cd"},{"author":{"_account_id":16721,"name":"Zohar Mamedov","email":"zohar.cloud@gmail.com","username":"zohar"},"change_message_id":"d983a6ff4290a23c1e221b9715ef8c06acde5163","unresolved":true,"context_lines":[{"line_number":31,"context_line":"For target-side volume replication (traditional approach), it is the storage"},{"line_number":32,"context_line":"backend that takes care of monitoring and self healing."},{"line_number":33,"context_line":"The NVMe + MDRAID approach moves the data replication responsibility from the"},{"line_number":34,"context_line":"storage backend to the consuming initiator (compute node)."},{"line_number":35,"context_line":""},{"line_number":36,"context_line":"Currently there\u0027s no mechanism to monitor and heal these replicated volumes."},{"line_number":37,"context_line":"We cannot do it only on the Cinder driver side because currently there is no"}],"source_content_type":"text/x-rst","patch_set":4,"id":"83a7ae8f_395227f4","line":34,"range":{"start_line":34,"start_character":44,"end_line":34,"end_character":56},"in_reply_to":"1b4b2610_458ba88e","updated":"2021-06-24 04:53:24.000000000","message":"I see, so in such a usecase, glance will keep a cinder volume attached (to the host) long-term, right? Thank you for pointing this out!\n\nI still want to keep the reference to \"compute node\" there since I believe it makes it easier to understand for people overall. However, to be more complete, I will mention the glance cinder usecase, with something like: \"(compute node or other long term attachments such as glance cinder usecase)\"","commit_id":"1dae425b7ab769464082b11d507d2a761df455cd"},{"author":{"_account_id":27615,"name":"Rajat Dhasmana","email":"rajatdhasmana@gmail.com","username":"whoami-rajat"},"change_message_id":"86c849145752f158be3d8996bbf15f6a5a06463c","unresolved":true,"context_lines":[{"line_number":31,"context_line":"For target-side volume replication (traditional approach), it is the storage"},{"line_number":32,"context_line":"backend that takes care of monitoring and self healing."},{"line_number":33,"context_line":"The NVMe + MDRAID approach moves the data replication responsibility from the"},{"line_number":34,"context_line":"storage backend to the consuming initiator (compute node)."},{"line_number":35,"context_line":""},{"line_number":36,"context_line":"Currently there\u0027s no mechanism to monitor and heal these replicated volumes."},{"line_number":37,"context_line":"We cannot do it only on the Cinder driver side because currently there is no"}],"source_content_type":"text/x-rst","patch_set":4,"id":"854c1b7e_08d19603","line":34,"range":{"start_line":34,"start_character":44,"end_line":34,"end_character":56},"in_reply_to":"83a7ae8f_395227f4","updated":"2021-06-24 12:23:37.000000000","message":"Glance attaches cinder volume for the duration of writing or reading data (image) from/to the volume which can last long if the image size is big.\nI like how you described it. +1","commit_id":"1dae425b7ab769464082b11d507d2a761df455cd"},{"author":{"_account_id":16721,"name":"Zohar Mamedov","email":"zohar.cloud@gmail.com","username":"zohar"},"change_message_id":"082317cd7c9cacfa8f189fa1f0c089ce1d676d66","unresolved":false,"context_lines":[{"line_number":31,"context_line":"For target-side volume replication (traditional approach), it is the storage"},{"line_number":32,"context_line":"backend that takes care of monitoring and self healing."},{"line_number":33,"context_line":"The NVMe + MDRAID approach moves the data replication responsibility from the"},{"line_number":34,"context_line":"storage backend to the consuming initiator (compute node)."},{"line_number":35,"context_line":""},{"line_number":36,"context_line":"Currently there\u0027s no mechanism to monitor and heal these replicated volumes."},{"line_number":37,"context_line":"We cannot do it only on the Cinder driver side because currently there is no"}],"source_content_type":"text/x-rst","patch_set":4,"id":"f4eb02d1_9d6a7f6c","line":34,"range":{"start_line":34,"start_character":44,"end_line":34,"end_character":56},"in_reply_to":"854c1b7e_08d19603","updated":"2021-06-24 15:06:06.000000000","message":"If the only usecase by glance is a short lived attachment, it is likely best for the operator to opt out of running the nvme agent on that (control?) node\n\nI will edit it to reflect it as such.","commit_id":"1dae425b7ab769464082b11d507d2a761df455cd"},{"author":{"_account_id":27615,"name":"Rajat Dhasmana","email":"rajatdhasmana@gmail.com","username":"whoami-rajat"},"change_message_id":"d46c723a9ddaabb32cc93425f3f8c84bf30a9bcf","unresolved":true,"context_lines":[{"line_number":19,"context_line":"Problem description"},{"line_number":20,"context_line":"\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d"},{"line_number":21,"context_line":""},{"line_number":22,"context_line":"When the NVMe connector connects a client-replicated volume, OpenStack will see"},{"line_number":23,"context_line":"it as one volume, and has no way of monitoring managing and healing the"},{"line_number":24,"context_line":"replicas in these MDRAID arrays. This agent will take care of that."},{"line_number":25,"context_line":""},{"line_number":26,"context_line":"It will monitor the NVMeoF connections and report changes to the storage"},{"line_number":27,"context_line":"orchestrator / provisioner. It will monitor MDRAID arrays and reconcile their"},{"line_number":28,"context_line":"physical state on the host with expected state from the volume provisioner,"},{"line_number":29,"context_line":"replacing broken legs."},{"line_number":30,"context_line":""},{"line_number":31,"context_line":"For target-side volume replication (traditional approach), it is the storage"},{"line_number":32,"context_line":"backend that takes care of monitoring and self healing."},{"line_number":33,"context_line":"The NVMe + MDRAID approach moves the data replication responsibility from the"},{"line_number":34,"context_line":"storage backend to the consuming initiator (compute node)."},{"line_number":35,"context_line":""},{"line_number":36,"context_line":"Currently there\u0027s no mechanism to monitor and heal these replicated volumes."},{"line_number":37,"context_line":"We cannot do it only on the Cinder driver side because currently there is no"},{"line_number":38,"context_line":"mechanism to detect initiator connection events and carry out replica"},{"line_number":39,"context_line":"replacement on the compute node."},{"line_number":40,"context_line":""},{"line_number":41,"context_line":"So the monitoring and healing needs to be on the initiator / compute side."},{"line_number":42,"context_line":""},{"line_number":43,"context_line":"Finally, orchestration decisions / optimizations will be carried by the volume"},{"line_number":44,"context_line":"orchestrator / provisioner using reported information from agent monitoring."},{"line_number":45,"context_line":"Though this is outside the scope of the agent (it is storage backend"},{"line_number":46,"context_line":"implemented functionality) - it is useful to mention here that it will handle"},{"line_number":47,"context_line":"cases such as avoid using faulty replicas during re-attachment scenarios,"},{"line_number":48,"context_line":"because in this design approach only the initiator node can detect the"},{"line_number":49,"context_line":"replicas\u0027 sync states of its MDRAID arrays."},{"line_number":50,"context_line":""},{"line_number":51,"context_line":""},{"line_number":52,"context_line":"Use Cases"}],"source_content_type":"text/x-rst","patch_set":4,"id":"f7655020_0393b9ec","line":49,"range":{"start_line":22,"start_character":0,"end_line":49,"end_character":43},"updated":"2021-06-23 14:11:12.000000000","message":"I feel this section could be restructured. Currently it is alternating between a part of the problem and a part of the solution. It might be better to first describe the proper problem statement and then how the agent addresses each of the cases described in the problem statement.\nEg: (I\u0027m not referring to the format, just problems should be described first and later how the proposed solution tackles it)\nProblems:\n1) when we have client replicated volumes, the replicated volume isn\u0027t visible to openstack hence noone to manage it\n2) on the target storage side, it is not possible to monitor the state of replicated volumes\nSolutions:\n1) agent can monitor NVMeoF connections and report changes to storage provisioner, also check for expected and actual state of replicas incase a replacement is needed\n2) agent will run on the initiator side\n3) additional benefit like not using faulty replicas incase of a re-attachment scenario.\n\nThis is just my opinion of making it more clear, others might have different opinions.","commit_id":"1dae425b7ab769464082b11d507d2a761df455cd"},{"author":{"_account_id":16721,"name":"Zohar Mamedov","email":"zohar.cloud@gmail.com","username":"zohar"},"change_message_id":"d983a6ff4290a23c1e221b9715ef8c06acde5163","unresolved":true,"context_lines":[{"line_number":19,"context_line":"Problem description"},{"line_number":20,"context_line":"\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d"},{"line_number":21,"context_line":""},{"line_number":22,"context_line":"When the NVMe connector connects a client-replicated volume, OpenStack will see"},{"line_number":23,"context_line":"it as one volume, and has no way of monitoring managing and healing the"},{"line_number":24,"context_line":"replicas in these MDRAID arrays. This agent will take care of that."},{"line_number":25,"context_line":""},{"line_number":26,"context_line":"It will monitor the NVMeoF connections and report changes to the storage"},{"line_number":27,"context_line":"orchestrator / provisioner. It will monitor MDRAID arrays and reconcile their"},{"line_number":28,"context_line":"physical state on the host with expected state from the volume provisioner,"},{"line_number":29,"context_line":"replacing broken legs."},{"line_number":30,"context_line":""},{"line_number":31,"context_line":"For target-side volume replication (traditional approach), it is the storage"},{"line_number":32,"context_line":"backend that takes care of monitoring and self healing."},{"line_number":33,"context_line":"The NVMe + MDRAID approach moves the data replication responsibility from the"},{"line_number":34,"context_line":"storage backend to the consuming initiator (compute node)."},{"line_number":35,"context_line":""},{"line_number":36,"context_line":"Currently there\u0027s no mechanism to monitor and heal these replicated volumes."},{"line_number":37,"context_line":"We cannot do it only on the Cinder driver side because currently there is no"},{"line_number":38,"context_line":"mechanism to detect initiator connection events and carry out replica"},{"line_number":39,"context_line":"replacement on the compute node."},{"line_number":40,"context_line":""},{"line_number":41,"context_line":"So the monitoring and healing needs to be on the initiator / compute side."},{"line_number":42,"context_line":""},{"line_number":43,"context_line":"Finally, orchestration decisions / optimizations will be carried by the volume"},{"line_number":44,"context_line":"orchestrator / provisioner using reported information from agent monitoring."},{"line_number":45,"context_line":"Though this is outside the scope of the agent (it is storage backend"},{"line_number":46,"context_line":"implemented functionality) - it is useful to mention here that it will handle"},{"line_number":47,"context_line":"cases such as avoid using faulty replicas during re-attachment scenarios,"},{"line_number":48,"context_line":"because in this design approach only the initiator node can detect the"},{"line_number":49,"context_line":"replicas\u0027 sync states of its MDRAID arrays."},{"line_number":50,"context_line":""},{"line_number":51,"context_line":""},{"line_number":52,"context_line":"Use Cases"}],"source_content_type":"text/x-rst","patch_set":4,"id":"bc44bf9e_04b66762","line":49,"range":{"start_line":22,"start_character":0,"end_line":49,"end_character":43},"in_reply_to":"f7655020_0393b9ec","updated":"2021-06-24 04:53:24.000000000","message":"Good point, thank you! I did some simple re-ordering of the paragraphs here. Hopefully its better organized and readable now.","commit_id":"1dae425b7ab769464082b11d507d2a761df455cd"},{"author":{"_account_id":27615,"name":"Rajat Dhasmana","email":"rajatdhasmana@gmail.com","username":"whoami-rajat"},"change_message_id":"d46c723a9ddaabb32cc93425f3f8c84bf30a9bcf","unresolved":true,"context_lines":[{"line_number":55,"context_line":"When working with replicated NVMeoF volumes that are attached to an instance"},{"line_number":56,"context_line":"for a long time, one of the replicas may go faulty."},{"line_number":57,"context_line":"This agent will detect it and attempt to replace it, i.e., self heal the"},{"line_number":58,"context_line":"MDRAID array, without the need to detach and re-attach the entire volume from"},{"line_number":59,"context_line":"the instance."},{"line_number":60,"context_line":""},{"line_number":61,"context_line":"Additionally, the agent will detect and report connection and replica sync"},{"line_number":62,"context_line":"state events to the storage orchestrator (or potentially other endpoints"}],"source_content_type":"text/x-rst","patch_set":4,"id":"df96ac32_23797f3a","line":59,"range":{"start_line":58,"start_character":14,"end_line":59,"end_character":13},"updated":"2021-06-23 14:11:12.000000000","message":"what happens if the attached volume (instead of the replica) goes faulty?","commit_id":"1dae425b7ab769464082b11d507d2a761df455cd"},{"author":{"_account_id":27615,"name":"Rajat Dhasmana","email":"rajatdhasmana@gmail.com","username":"whoami-rajat"},"change_message_id":"86c849145752f158be3d8996bbf15f6a5a06463c","unresolved":false,"context_lines":[{"line_number":55,"context_line":"When working with replicated NVMeoF volumes that are attached to an instance"},{"line_number":56,"context_line":"for a long time, one of the replicas may go faulty."},{"line_number":57,"context_line":"This agent will detect it and attempt to replace it, i.e., self heal the"},{"line_number":58,"context_line":"MDRAID array, without the need to detach and re-attach the entire volume from"},{"line_number":59,"context_line":"the instance."},{"line_number":60,"context_line":""},{"line_number":61,"context_line":"Additionally, the agent will detect and report connection and replica sync"},{"line_number":62,"context_line":"state events to the storage orchestrator (or potentially other endpoints"}],"source_content_type":"text/x-rst","patch_set":4,"id":"5a07a041_eef798d8","line":59,"range":{"start_line":58,"start_character":14,"end_line":59,"end_character":13},"in_reply_to":"b5daca6f_1df3beda","updated":"2021-06-24 12:23:37.000000000","message":"Ack","commit_id":"1dae425b7ab769464082b11d507d2a761df455cd"},{"author":{"_account_id":16721,"name":"Zohar Mamedov","email":"zohar.cloud@gmail.com","username":"zohar"},"change_message_id":"d983a6ff4290a23c1e221b9715ef8c06acde5163","unresolved":true,"context_lines":[{"line_number":55,"context_line":"When working with replicated NVMeoF volumes that are attached to an instance"},{"line_number":56,"context_line":"for a long time, one of the replicas may go faulty."},{"line_number":57,"context_line":"This agent will detect it and attempt to replace it, i.e., self heal the"},{"line_number":58,"context_line":"MDRAID array, without the need to detach and re-attach the entire volume from"},{"line_number":59,"context_line":"the instance."},{"line_number":60,"context_line":""},{"line_number":61,"context_line":"Additionally, the agent will detect and report connection and replica sync"},{"line_number":62,"context_line":"state events to the storage orchestrator (or potentially other endpoints"}],"source_content_type":"text/x-rst","patch_set":4,"id":"b5daca6f_1df3beda","line":59,"range":{"start_line":58,"start_character":14,"end_line":59,"end_character":13},"in_reply_to":"df96ac32_23797f3a","updated":"2021-06-24 04:53:24.000000000","message":"If you are refering to the attached volume being the entire mdraid array, then it is not in the scope of the agent to verify data integrity of the entire raid array (since it cannot make any assumptions as to the actual data).\n\nThe faultiness that it detects can only come from the nvme stack (connection problems to individual nvme device), or from the mdraid stack (out of sync aka faulty replica).\n\nCurrently, if the entire raid array goes out of sync (ie. there are bits that are out of sync on every one of the replicas) there is nothing as far as I know that the agent can do about it. \n\n(however there may be some things that can be done to recover from this as part of the entire storage solution, which will involve some orchestration decisions from the storage provisioner, but I don\u0027t believe this is currently part of the scope of the agent, but it is a great discussion and I will ask the storage folks designing the entire KumoScale system about what are their takes on this).","commit_id":"1dae425b7ab769464082b11d507d2a761df455cd"},{"author":{"_account_id":27615,"name":"Rajat Dhasmana","email":"rajatdhasmana@gmail.com","username":"whoami-rajat"},"change_message_id":"d46c723a9ddaabb32cc93425f3f8c84bf30a9bcf","unresolved":true,"context_lines":[{"line_number":74,"context_line":""},{"line_number":75,"context_line":"The agent main function will first initialize the agent by reading access"},{"line_number":76,"context_line":"information to the volume orchestor / provisioner from a pre-defined config"},{"line_number":77,"context_line":"file (such as `/etc/os-brick/agent.conf` ?)"},{"line_number":78,"context_line":""},{"line_number":79,"context_line":"Vendor specific params will be used and prefixed by the vendor prefix, such as:"},{"line_number":80,"context_line":"`kioxia_provisioner_ip`"}],"source_content_type":"text/x-rst","patch_set":4,"id":"68489f8d_14951b4f","line":77,"range":{"start_line":77,"start_character":20,"end_line":77,"end_character":28},"updated":"2021-06-23 14:11:12.000000000","message":"we currently don\u0027t have os-brick directory and I\u0027m not sure about creating one for a single conf file. (maybe create a different directory since this agent is not specific to os-brick and is not required with other connectors)\nWe currently store nfs shares file in /etc/cinder but this might be a totally different case.","commit_id":"1dae425b7ab769464082b11d507d2a761df455cd"},{"author":{"_account_id":16721,"name":"Zohar Mamedov","email":"zohar.cloud@gmail.com","username":"zohar"},"change_message_id":"d983a6ff4290a23c1e221b9715ef8c06acde5163","unresolved":true,"context_lines":[{"line_number":74,"context_line":""},{"line_number":75,"context_line":"The agent main function will first initialize the agent by reading access"},{"line_number":76,"context_line":"information to the volume orchestor / provisioner from a pre-defined config"},{"line_number":77,"context_line":"file (such as `/etc/os-brick/agent.conf` ?)"},{"line_number":78,"context_line":""},{"line_number":79,"context_line":"Vendor specific params will be used and prefixed by the vendor prefix, such as:"},{"line_number":80,"context_line":"`kioxia_provisioner_ip`"}],"source_content_type":"text/x-rst","patch_set":4,"id":"14d8ac76_8bf80e33","line":77,"range":{"start_line":77,"start_character":20,"end_line":77,"end_character":28},"in_reply_to":"68489f8d_14951b4f","updated":"2021-06-24 04:53:24.000000000","message":"Great point! I was also really not sure about this, I am currently leaning towards using `/etc/nvme-agent/` for this - will update this in the spec now, and if anyone has more input on this would love to hear it!","commit_id":"1dae425b7ab769464082b11d507d2a761df455cd"},{"author":{"_account_id":27615,"name":"Rajat Dhasmana","email":"rajatdhasmana@gmail.com","username":"whoami-rajat"},"change_message_id":"d46c723a9ddaabb32cc93425f3f8c84bf30a9bcf","unresolved":true,"context_lines":[{"line_number":96,"context_line":""},{"line_number":97,"context_line":"Typical self healing flow:"},{"line_number":98,"context_line":""},{"line_number":99,"context_line":"1. volume replica goes faulty"},{"line_number":100,"context_line":"2. agent notices faulty replica, reports to provisioner"},{"line_number":101,"context_line":"3. provisioner marks replica as bad (so it wont be used later unless synced)"},{"line_number":102,"context_line":"4. agent keeps pulling volume information from provisioner"}],"source_content_type":"text/x-rst","patch_set":4,"id":"96f05077_716badc8","line":99,"range":{"start_line":99,"start_character":3,"end_line":99,"end_character":29},"updated":"2021-06-23 14:11:12.000000000","message":"does the agent only monitor replicas? if the volume in use goes faulty, what will happen?","commit_id":"1dae425b7ab769464082b11d507d2a761df455cd"},{"author":{"_account_id":16721,"name":"Zohar Mamedov","email":"zohar.cloud@gmail.com","username":"zohar"},"change_message_id":"d983a6ff4290a23c1e221b9715ef8c06acde5163","unresolved":true,"context_lines":[{"line_number":96,"context_line":""},{"line_number":97,"context_line":"Typical self healing flow:"},{"line_number":98,"context_line":""},{"line_number":99,"context_line":"1. volume replica goes faulty"},{"line_number":100,"context_line":"2. agent notices faulty replica, reports to provisioner"},{"line_number":101,"context_line":"3. provisioner marks replica as bad (so it wont be used later unless synced)"},{"line_number":102,"context_line":"4. agent keeps pulling volume information from provisioner"}],"source_content_type":"text/x-rst","patch_set":4,"id":"efb81ab5_c87187a4","line":99,"range":{"start_line":99,"start_character":3,"end_line":99,"end_character":29},"in_reply_to":"96f05077_716badc8","updated":"2021-06-24 04:53:24.000000000","message":"Same as my response to your comment on line 59\n\nIn short, the only way for the entire (replicated raid1 array) volume to be faulty is if there are bits that are marked as out of sync on every one of its replicas. In this case, at least on the surface, there is no way of knowing what the correct bits are, so this is at least currently out of scope of this agent.\n\nThis is a great corner case to be aware of, but even with it it is still more resilient to use raid1 replicated volumes.","commit_id":"1dae425b7ab769464082b11d507d2a761df455cd"},{"author":{"_account_id":27615,"name":"Rajat Dhasmana","email":"rajatdhasmana@gmail.com","username":"whoami-rajat"},"change_message_id":"d46c723a9ddaabb32cc93425f3f8c84bf30a9bcf","unresolved":true,"context_lines":[{"line_number":127,"context_line":"Security impact"},{"line_number":128,"context_line":"---------------"},{"line_number":129,"context_line":""},{"line_number":130,"context_line":"Sudo executions of `nvme` and `mdadm`"},{"line_number":131,"context_line":"Needs access for reading of root filesystem paths such as:"},{"line_number":132,"context_line":"`/sys/class/nvme-fabrics/...`"},{"line_number":133,"context_line":"`/sys/class/block/...`"}],"source_content_type":"text/x-rst","patch_set":4,"id":"af284b96_da52263f","line":130,"range":{"start_line":130,"start_character":0,"end_line":130,"end_character":37},"updated":"2021-06-23 14:11:12.000000000","message":"do we need to mention this? nvme connector (and other connectors also) already does that with the help of rootwrap and are expected to do it.\nMaybe good to mention anyway but others can share their thoughts on this.\neg: (see run_as_root\u003dTrue) https://github.com/openstack/os-brick/blob/master/os_brick/initiator/connectors/nvmeof.py#L138-L140","commit_id":"1dae425b7ab769464082b11d507d2a761df455cd"},{"author":{"_account_id":16721,"name":"Zohar Mamedov","email":"zohar.cloud@gmail.com","username":"zohar"},"change_message_id":"d983a6ff4290a23c1e221b9715ef8c06acde5163","unresolved":true,"context_lines":[{"line_number":127,"context_line":"Security impact"},{"line_number":128,"context_line":"---------------"},{"line_number":129,"context_line":""},{"line_number":130,"context_line":"Sudo executions of `nvme` and `mdadm`"},{"line_number":131,"context_line":"Needs access for reading of root filesystem paths such as:"},{"line_number":132,"context_line":"`/sys/class/nvme-fabrics/...`"},{"line_number":133,"context_line":"`/sys/class/block/...`"}],"source_content_type":"text/x-rst","patch_set":4,"id":"1470c219_57f718a8","line":130,"range":{"start_line":130,"start_character":0,"end_line":130,"end_character":37},"in_reply_to":"af284b96_da52263f","updated":"2021-06-24 04:53:24.000000000","message":"I am leaning on the safe side of \"why not be explicit and keep this here\" - and of course lets see if we get some more thoughts from others","commit_id":"1dae425b7ab769464082b11d507d2a761df455cd"},{"author":{"_account_id":27615,"name":"Rajat Dhasmana","email":"rajatdhasmana@gmail.com","username":"whoami-rajat"},"change_message_id":"d46c723a9ddaabb32cc93425f3f8c84bf30a9bcf","unresolved":true,"context_lines":[{"line_number":128,"context_line":"---------------"},{"line_number":129,"context_line":""},{"line_number":130,"context_line":"Sudo executions of `nvme` and `mdadm`"},{"line_number":131,"context_line":"Needs access for reading of root filesystem paths such as:"},{"line_number":132,"context_line":"`/sys/class/nvme-fabrics/...`"},{"line_number":133,"context_line":"`/sys/class/block/...`"},{"line_number":134,"context_line":""},{"line_number":135,"context_line":""},{"line_number":136,"context_line":"Active/Active HA impact"}],"source_content_type":"text/x-rst","patch_set":4,"id":"bae507a0_c1eacef7","line":133,"range":{"start_line":131,"start_character":0,"end_line":133,"end_character":22},"updated":"2021-06-23 14:11:12.000000000","message":"same as above?","commit_id":"1dae425b7ab769464082b11d507d2a761df455cd"},{"author":{"_account_id":5314,"name":"Brian Rosmaita","email":"rosmaita.fossdev@gmail.com","username":"brian-rosmaita"},"change_message_id":"ec22aa7e00eca2f0dea2c355f4a8a9a4d12c0433","unresolved":true,"context_lines":[{"line_number":151,"context_line":"Performance Impact"},{"line_number":152,"context_line":"------------------"},{"line_number":153,"context_line":""},{"line_number":154,"context_line":"If configured to run by the operator, this will be a new process running on"},{"line_number":155,"context_line":"the compute node. Though it will spend most of its time sleeping, it will"},{"line_number":156,"context_line":"wake up every 30 seconds to do its period tasks: probe the storage provisioner"},{"line_number":157,"context_line":"and inspect nvme connections and mdraid states."}],"source_content_type":"text/x-rst","patch_set":4,"id":"eebd9ad6_c39e29eb","line":154,"range":{"start_line":154,"start_character":0,"end_line":154,"end_character":36},"updated":"2021-06-23 12:46:47.000000000","message":"You\u0027re right, it\u0027s good to point out that this will only impact operators who choose to use it.","commit_id":"1dae425b7ab769464082b11d507d2a761df455cd"},{"author":{"_account_id":16721,"name":"Zohar Mamedov","email":"zohar.cloud@gmail.com","username":"zohar"},"change_message_id":"d983a6ff4290a23c1e221b9715ef8c06acde5163","unresolved":false,"context_lines":[{"line_number":151,"context_line":"Performance Impact"},{"line_number":152,"context_line":"------------------"},{"line_number":153,"context_line":""},{"line_number":154,"context_line":"If configured to run by the operator, this will be a new process running on"},{"line_number":155,"context_line":"the compute node. Though it will spend most of its time sleeping, it will"},{"line_number":156,"context_line":"wake up every 30 seconds to do its period tasks: probe the storage provisioner"},{"line_number":157,"context_line":"and inspect nvme connections and mdraid states."}],"source_content_type":"text/x-rst","patch_set":4,"id":"c3a44386_cb0a62ce","line":154,"range":{"start_line":154,"start_character":0,"end_line":154,"end_character":36},"in_reply_to":"eebd9ad6_c39e29eb","updated":"2021-06-24 04:53:24.000000000","message":"Ack","commit_id":"1dae425b7ab769464082b11d507d2a761df455cd"},{"author":{"_account_id":5314,"name":"Brian Rosmaita","email":"rosmaita.fossdev@gmail.com","username":"brian-rosmaita"},"change_message_id":"ec22aa7e00eca2f0dea2c355f4a8a9a4d12c0433","unresolved":true,"context_lines":[{"line_number":153,"context_line":""},{"line_number":154,"context_line":"If configured to run by the operator, this will be a new process running on"},{"line_number":155,"context_line":"the compute node. Though it will spend most of its time sleeping, it will"},{"line_number":156,"context_line":"wake up every 30 seconds to do its period tasks: probe the storage provisioner"},{"line_number":157,"context_line":"and inspect nvme connections and mdraid states."},{"line_number":158,"context_line":""},{"line_number":159,"context_line":"These tasks are not compute intesive, with time mostly spent waiting for a"}],"source_content_type":"text/x-rst","patch_set":4,"id":"1cb9b5cf_6c9e2fa5","line":156,"updated":"2021-06-23 12:46:47.000000000","message":"nit: periodic","commit_id":"1dae425b7ab769464082b11d507d2a761df455cd"},{"author":{"_account_id":16721,"name":"Zohar Mamedov","email":"zohar.cloud@gmail.com","username":"zohar"},"change_message_id":"d983a6ff4290a23c1e221b9715ef8c06acde5163","unresolved":false,"context_lines":[{"line_number":153,"context_line":""},{"line_number":154,"context_line":"If configured to run by the operator, this will be a new process running on"},{"line_number":155,"context_line":"the compute node. Though it will spend most of its time sleeping, it will"},{"line_number":156,"context_line":"wake up every 30 seconds to do its period tasks: probe the storage provisioner"},{"line_number":157,"context_line":"and inspect nvme connections and mdraid states."},{"line_number":158,"context_line":""},{"line_number":159,"context_line":"These tasks are not compute intesive, with time mostly spent waiting for a"}],"source_content_type":"text/x-rst","patch_set":4,"id":"48e9e5f2_1f71299f","line":156,"in_reply_to":"1cb9b5cf_6c9e2fa5","updated":"2021-06-24 04:53:24.000000000","message":"Done","commit_id":"1dae425b7ab769464082b11d507d2a761df455cd"},{"author":{"_account_id":7198,"name":"Jay Bryant","email":"jungleboyj@electronicjungle.net","username":"jsbryant"},"change_message_id":"b35209d3d81e9c0df6c3df38cf5407875e89938a","unresolved":true,"context_lines":[{"line_number":156,"context_line":"wake up every 30 seconds to do its period tasks: probe the storage provisioner"},{"line_number":157,"context_line":"and inspect nvme connections and mdraid states."},{"line_number":158,"context_line":""},{"line_number":159,"context_line":"These tasks are not compute intesive, with time mostly spent waiting for a"},{"line_number":160,"context_line":"response from the storage provisioner, and the nvme and mdraid operations will"},{"line_number":161,"context_line":"only be time complexity linear to the number of devices under the control of"},{"line_number":162,"context_line":"the agent (which can be treated as constant due to a low upper limit per host.)"}],"source_content_type":"text/x-rst","patch_set":4,"id":"9392a644_2be6bd17","line":159,"range":{"start_line":159,"start_character":28,"end_line":159,"end_character":36},"updated":"2021-06-23 14:50:55.000000000","message":"nit: intensive","commit_id":"1dae425b7ab769464082b11d507d2a761df455cd"},{"author":{"_account_id":16721,"name":"Zohar Mamedov","email":"zohar.cloud@gmail.com","username":"zohar"},"change_message_id":"d983a6ff4290a23c1e221b9715ef8c06acde5163","unresolved":false,"context_lines":[{"line_number":156,"context_line":"wake up every 30 seconds to do its period tasks: probe the storage provisioner"},{"line_number":157,"context_line":"and inspect nvme connections and mdraid states."},{"line_number":158,"context_line":""},{"line_number":159,"context_line":"These tasks are not compute intesive, with time mostly spent waiting for a"},{"line_number":160,"context_line":"response from the storage provisioner, and the nvme and mdraid operations will"},{"line_number":161,"context_line":"only be time complexity linear to the number of devices under the control of"},{"line_number":162,"context_line":"the agent (which can be treated as constant due to a low upper limit per host.)"}],"source_content_type":"text/x-rst","patch_set":4,"id":"a7a91fcc_27e18d20","line":159,"range":{"start_line":159,"start_character":28,"end_line":159,"end_character":36},"in_reply_to":"9392a644_2be6bd17","updated":"2021-06-24 04:53:24.000000000","message":"Done","commit_id":"1dae425b7ab769464082b11d507d2a761df455cd"},{"author":{"_account_id":5314,"name":"Brian Rosmaita","email":"rosmaita.fossdev@gmail.com","username":"brian-rosmaita"},"change_message_id":"ec22aa7e00eca2f0dea2c355f4a8a9a4d12c0433","unresolved":true,"context_lines":[{"line_number":159,"context_line":"These tasks are not compute intesive, with time mostly spent waiting for a"},{"line_number":160,"context_line":"response from the storage provisioner, and the nvme and mdraid operations will"},{"line_number":161,"context_line":"only be time complexity linear to the number of devices under the control of"},{"line_number":162,"context_line":"the agent (which can be treated as constant due to a low upper limit per host.)"},{"line_number":163,"context_line":"And finally, the performance effect on the network will also be small, since it"},{"line_number":164,"context_line":"will only be sending/receiving small amounts of (meta)data across the network."},{"line_number":165,"context_line":""}],"source_content_type":"text/x-rst","patch_set":4,"id":"c23648a6_22d60b1e","line":162,"range":{"start_line":162,"start_character":77,"end_line":162,"end_character":79},"updated":"2021-06-23 12:46:47.000000000","message":"nit: period should be outside the parentheses","commit_id":"1dae425b7ab769464082b11d507d2a761df455cd"},{"author":{"_account_id":16721,"name":"Zohar Mamedov","email":"zohar.cloud@gmail.com","username":"zohar"},"change_message_id":"d983a6ff4290a23c1e221b9715ef8c06acde5163","unresolved":false,"context_lines":[{"line_number":159,"context_line":"These tasks are not compute intesive, with time mostly spent waiting for a"},{"line_number":160,"context_line":"response from the storage provisioner, and the nvme and mdraid operations will"},{"line_number":161,"context_line":"only be time complexity linear to the number of devices under the control of"},{"line_number":162,"context_line":"the agent (which can be treated as constant due to a low upper limit per host.)"},{"line_number":163,"context_line":"And finally, the performance effect on the network will also be small, since it"},{"line_number":164,"context_line":"will only be sending/receiving small amounts of (meta)data across the network."},{"line_number":165,"context_line":""}],"source_content_type":"text/x-rst","patch_set":4,"id":"dfae1cab_b3ab4408","line":162,"range":{"start_line":162,"start_character":77,"end_line":162,"end_character":79},"in_reply_to":"c23648a6_22d60b1e","updated":"2021-06-24 04:53:24.000000000","message":"Done","commit_id":"1dae425b7ab769464082b11d507d2a761df455cd"},{"author":{"_account_id":27615,"name":"Rajat Dhasmana","email":"rajatdhasmana@gmail.com","username":"whoami-rajat"},"change_message_id":"86c849145752f158be3d8996bbf15f6a5a06463c","unresolved":true,"context_lines":[{"line_number":36,"context_line":""},{"line_number":37,"context_line":"So the monitoring and healing needs to be on the initiator / compute side."},{"line_number":38,"context_line":""},{"line_number":39,"context_line":"With this approach, the agent It will monitor the NVMeoF connections and report"},{"line_number":40,"context_line":"changes to the storage orchestrator / provisioner. It will monitor MDRAID arrays"},{"line_number":41,"context_line":"and reconcile their physical state on the host with expected state from the"},{"line_number":42,"context_line":"volume provisioner, replacing broken legs."}],"source_content_type":"text/x-rst","patch_set":5,"id":"61b2c5f6_a04844cf","line":39,"range":{"start_line":39,"start_character":30,"end_line":39,"end_character":32},"updated":"2021-06-24 12:23:37.000000000","message":"nit: remove this","commit_id":"0dd6a51ec04097711c3091e5026add9ab3981fb2"},{"author":{"_account_id":16721,"name":"Zohar Mamedov","email":"zohar.cloud@gmail.com","username":"zohar"},"change_message_id":"082317cd7c9cacfa8f189fa1f0c089ce1d676d66","unresolved":false,"context_lines":[{"line_number":36,"context_line":""},{"line_number":37,"context_line":"So the monitoring and healing needs to be on the initiator / compute side."},{"line_number":38,"context_line":""},{"line_number":39,"context_line":"With this approach, the agent It will monitor the NVMeoF connections and report"},{"line_number":40,"context_line":"changes to the storage orchestrator / provisioner. It will monitor MDRAID arrays"},{"line_number":41,"context_line":"and reconcile their physical state on the host with expected state from the"},{"line_number":42,"context_line":"volume provisioner, replacing broken legs."}],"source_content_type":"text/x-rst","patch_set":5,"id":"73823105_a402e7f0","line":39,"range":{"start_line":39,"start_character":30,"end_line":39,"end_character":32},"in_reply_to":"61b2c5f6_a04844cf","updated":"2021-06-24 15:06:06.000000000","message":"Done","commit_id":"0dd6a51ec04097711c3091e5026add9ab3981fb2"},{"author":{"_account_id":27615,"name":"Rajat Dhasmana","email":"rajatdhasmana@gmail.com","username":"whoami-rajat"},"change_message_id":"86c849145752f158be3d8996bbf15f6a5a06463c","unresolved":true,"context_lines":[{"line_number":159,"context_line":""},{"line_number":160,"context_line":"These tasks are not compute intensive, with time mostly spent waiting for a"},{"line_number":161,"context_line":"response from the storage provisioner, and the nvme and mdraid operations will"},{"line_number":162,"context_line":"only be time complexity linear to the number of devices under the control of"},{"line_number":163,"context_line":"the agent (which can be treated as constant due to a low upper limit per host)."},{"line_number":164,"context_line":"And finally, the performance effect on the network will also be small, since it"},{"line_number":165,"context_line":"will only be sending/receiving small amounts of (meta)data across the network."}],"source_content_type":"text/x-rst","patch_set":5,"id":"14f9090a_d1056cc4","line":162,"range":{"start_line":162,"start_character":5,"end_line":162,"end_character":7},"updated":"2021-06-24 12:23:37.000000000","message":"nit: have","commit_id":"0dd6a51ec04097711c3091e5026add9ab3981fb2"},{"author":{"_account_id":16721,"name":"Zohar Mamedov","email":"zohar.cloud@gmail.com","username":"zohar"},"change_message_id":"082317cd7c9cacfa8f189fa1f0c089ce1d676d66","unresolved":false,"context_lines":[{"line_number":159,"context_line":""},{"line_number":160,"context_line":"These tasks are not compute intensive, with time mostly spent waiting for a"},{"line_number":161,"context_line":"response from the storage provisioner, and the nvme and mdraid operations will"},{"line_number":162,"context_line":"only be time complexity linear to the number of devices under the control of"},{"line_number":163,"context_line":"the agent (which can be treated as constant due to a low upper limit per host)."},{"line_number":164,"context_line":"And finally, the performance effect on the network will also be small, since it"},{"line_number":165,"context_line":"will only be sending/receiving small amounts of (meta)data across the network."}],"source_content_type":"text/x-rst","patch_set":5,"id":"2bdd3401_377b492d","line":162,"range":{"start_line":162,"start_character":5,"end_line":162,"end_character":7},"in_reply_to":"14f9090a_d1056cc4","updated":"2021-06-24 15:06:06.000000000","message":"Done","commit_id":"0dd6a51ec04097711c3091e5026add9ab3981fb2"},{"author":{"_account_id":27615,"name":"Rajat Dhasmana","email":"rajatdhasmana@gmail.com","username":"whoami-rajat"},"change_message_id":"283a3c553e713566d361f6be180583a7f6181ca6","unresolved":true,"context_lines":[{"line_number":31,"context_line":"For target-side volume replication (traditional approach), it is the storage"},{"line_number":32,"context_line":"backend that takes care of monitoring and self healing."},{"line_number":33,"context_line":"The NVMe + MDRAID approach moves the data replication responsibility from the"},{"line_number":34,"context_line":"storage backend to the consuming initiator (ie. compute node)."},{"line_number":35,"context_line":""},{"line_number":36,"context_line":"So the monitoring and healing needs to be on the initiator / compute side."},{"line_number":37,"context_line":""}],"source_content_type":"text/x-rst","patch_set":6,"id":"ea068b07_73e41d1b","line":34,"range":{"start_line":34,"start_character":44,"end_line":34,"end_character":47},"updated":"2021-06-25 09:26:30.000000000","message":"nit: i.e.","commit_id":"46bfe10ef67279a6e265d5b7a4d25dfd6c701154"},{"author":{"_account_id":5314,"name":"Brian Rosmaita","email":"rosmaita.fossdev@gmail.com","username":"brian-rosmaita"},"change_message_id":"0935b354014d6ecbce4935802a9b310056f64536","unresolved":true,"context_lines":[{"line_number":183,"context_line":"a per vendor basis."},{"line_number":184,"context_line":""},{"line_number":185,"context_line":"The architecture is such that the agent will be a generic daemon that will"},{"line_number":186,"context_line":"define the interface, and the kioxia implementation will be the first"},{"line_number":187,"context_line":"example of a vendor-specific implementation."},{"line_number":188,"context_line":""},{"line_number":189,"context_line":""},{"line_number":190,"context_line":"Implementation"}],"source_content_type":"text/x-rst","patch_set":6,"id":"e7f6ed27_e9c5d499","line":187,"range":{"start_line":186,"start_character":26,"end_line":187,"end_character":43},"updated":"2021-06-25 20:50:55.000000000","message":"I\u0027m not going to hold the spec up over this, but iirc the issue of how to do code reuse from the kioxia cinder driver (for the vendor-specific part) came up at the PTG. Please make sure you run your ideas about that by the cinder team before you get to far into the implementation.","commit_id":"46bfe10ef67279a6e265d5b7a4d25dfd6c701154"},{"author":{"_account_id":16721,"name":"Zohar Mamedov","email":"zohar.cloud@gmail.com","username":"zohar"},"change_message_id":"afc5f440b7c474999bda319fdc91cfc74e7328de","unresolved":true,"context_lines":[{"line_number":183,"context_line":"a per vendor basis."},{"line_number":184,"context_line":""},{"line_number":185,"context_line":"The architecture is such that the agent will be a generic daemon that will"},{"line_number":186,"context_line":"define the interface, and the kioxia implementation will be the first"},{"line_number":187,"context_line":"example of a vendor-specific implementation."},{"line_number":188,"context_line":""},{"line_number":189,"context_line":""},{"line_number":190,"context_line":"Implementation"}],"source_content_type":"text/x-rst","patch_set":6,"id":"aa0c845c_b238725f","line":187,"range":{"start_line":186,"start_character":26,"end_line":187,"end_character":43},"in_reply_to":"e7f6ed27_e9c5d499","updated":"2021-06-26 13:55:50.000000000","message":"Sounds good, will do that. The first draft implementation will be with a copy-over of the REST API from the Cinder driver (since it is easiest to just duplicate it) - but I will bring this up with the Cinder team as I am implementing this (and hopefully more questions and insights arise in the process).\n\nThank you!","commit_id":"46bfe10ef67279a6e265d5b7a4d25dfd6c701154"}]}