)]}'
{"/PATCHSET_LEVEL":[{"author":{"_account_id":24434,"name":"Fabian Wiesel","email":"fabian.wiesel@sap.com","username":"fwiesel"},"change_message_id":"7e2ed00f9dd99cecfde537f5ed0f1363c6fe8ba2","unresolved":false,"context_lines":[],"source_content_type":"","patch_set":5,"id":"4f3e9b9c_0580a687","updated":"2024-06-25 17:32:42.000000000","message":"Thanks for you input. I\u0027ll have to come back to some of your points, and revise the document a bit.","commit_id":"8526e465059b0418646fccc3d0454336670a1c68"},{"author":{"_account_id":24434,"name":"Fabian Wiesel","email":"fabian.wiesel@sap.com","username":"fwiesel"},"change_message_id":"3a9baa2b737cac8f9cd93104e83c3d06ae52ad5d","unresolved":false,"context_lines":[],"source_content_type":"","patch_set":7,"id":"1045817a_e5648454","updated":"2024-07-02 13:00:09.000000000","message":"Updated the spec taking cloud-init as a consumer into account.","commit_id":"abadb7e52d2fb529c73303258d423be48802c436"},{"author":{"_account_id":24434,"name":"Fabian Wiesel","email":"fabian.wiesel@sap.com","username":"fwiesel"},"change_message_id":"5bde13bbd0103827ebe7164f6423b972d3183700","unresolved":false,"context_lines":[],"source_content_type":"","patch_set":8,"id":"16e6eae3_96a9f605","updated":"2024-07-02 16:02:39.000000000","message":"I hope, I have addressed at least some of your concerns.","commit_id":"877247c1ae34d56a91dfb6519d87f1e8cb5be81f"},{"author":{"_account_id":24434,"name":"Fabian Wiesel","email":"fabian.wiesel@sap.com","username":"fwiesel"},"change_message_id":"e3f6c2737c8b63d30d5b7bcd008b5c8e2789eb44","unresolved":false,"context_lines":[],"source_content_type":"","patch_set":9,"id":"7d69b0bb_31fda236","updated":"2024-07-03 10:41:45.000000000","message":"Added dansmith and gibi, as we discussed this proposal on the vPTG.","commit_id":"362bf84f5f0f6f477b91e3fc92b5c7431f1bb7d5"},{"author":{"_account_id":4393,"name":"Dan Smith","email":"dms@danplanet.com","username":"danms"},"change_message_id":"86ead97a471529c8853c7c0058a63f02ee4dc11b","unresolved":false,"context_lines":[],"source_content_type":"","patch_set":9,"id":"5437a880_c162e169","updated":"2024-07-18 16:22:56.000000000","message":"I\u0027m skeptical of this plan and the claimed benefits. I don\u0027t really have time to dig into it very deeply to confirm or challenge some of the claims made here at this point. I don\u0027t want to fully -1 this because I don\u0027t really have time to spend digging into these claims to prove or disprove, so consider this a -0.9 :)","commit_id":"362bf84f5f0f6f477b91e3fc92b5c7431f1bb7d5"},{"author":{"_account_id":24434,"name":"Fabian Wiesel","email":"fabian.wiesel@sap.com","username":"fwiesel"},"change_message_id":"7fa72575e0cd2111d080c11e0dcebe63fbe50251","unresolved":false,"context_lines":[],"source_content_type":"","patch_set":10,"id":"6444f4f1_53895eb6","updated":"2024-07-19 09:13:53.000000000","message":"I hope, I addressed the points raised.","commit_id":"05cb62ca6a0173918258677f6ef71f1ed400e047"}],"specs/2024.2/approved/lazy-metadata-loading.rst":[{"author":{"_account_id":11604,"name":"sean mooney","email":"smooney@redhat.com","username":"sean-k-mooney"},"change_message_id":"62027cfeeb7bd7eb8ce0ec5a33cf50e95bff5292","unresolved":true,"context_lines":[{"line_number":37,"context_line":""},{"line_number":38,"context_line":""},{"line_number":39,"context_line":"Regardless of which data has been requested, the nova-metadata process will"},{"line_number":40,"context_line":"first collect all data from its cells and from Neutron, thereby potentially"},{"line_number":41,"context_line":"causing unneccesary load on the Neutron and Nova services."},{"line_number":42,"context_line":""},{"line_number":43,"context_line":"The caching strategy implemented is caching the whole data and expiring it"}],"source_content_type":"text/x-rst","patch_set":5,"id":"3989d629_9c24dc7d","line":40,"updated":"2024-06-25 16:29:18.000000000","message":"we do not clollect info form nuton by the way as far as im aware\n\nthe networkign infromaation is built for the network info cache content which is stored in the nova db.\n\nwe also dont need to collect data form cells plural \n\nall data is either in the api db or in the specific cell db which contains the instance.","commit_id":"8526e465059b0418646fccc3d0454336670a1c68"},{"author":{"_account_id":24434,"name":"Fabian Wiesel","email":"fabian.wiesel@sap.com","username":"fwiesel"},"change_message_id":"e3f6c2737c8b63d30d5b7bcd008b5c8e2789eb44","unresolved":true,"context_lines":[{"line_number":37,"context_line":""},{"line_number":38,"context_line":""},{"line_number":39,"context_line":"Regardless of which data has been requested, the nova-metadata process will"},{"line_number":40,"context_line":"first collect all data from its cells and from Neutron, thereby potentially"},{"line_number":41,"context_line":"causing unneccesary load on the Neutron and Nova services."},{"line_number":42,"context_line":""},{"line_number":43,"context_line":"The caching strategy implemented is caching the whole data and expiring it"}],"source_content_type":"text/x-rst","patch_set":5,"id":"f568aa39_940d71a1","line":40,"in_reply_to":"1fd40050_c593955b","updated":"2024-07-03 10:41:45.000000000","message":"Up front: I do like saving security group to info_cache, but In my mind, it only works with notifications from Neutron. Something that was discussed and rejected (maybe due to my limited use-case or bad arguments) in the vPTG. I\u0027ll try to get the other core members in on that. Maybe \"your\" use-case and points make a difference.\n\nI think, we probably do agree that the metadata server only needs to be \"eventually consistent\", but I have to give my users an upper-bound for that. \nThat upper-bound would then determine the lower-bound for the polling-frequency to poll *all* the instances (as `_heal_instance_info_cache` only polls one instance per interval per compute-host).\n\n\nSo, if I want to be able to provide my users somewhere around 5min upper-bound (which in my mind is already a hard sell), then I have to poll all the ports of all the instances within that time-frame, just to be able to provide the X% of users which actually have a use-case for that.\n\nI.e. in our largest region, we have around\n- 2M metadata requests/day (each of them resulting in the port-query) \n- 33.000 requests/d (on security-groups)\n- roughly 47-48k VMs.\n\nWith the (admittedly arbitrary) 5min, I end up then with 13M/day in port queries.\nSo, let\u0027s turn it around and use the current request budget which results in roughly one request every 1.75h per instance.\n\n\nNow on to the position, just use the Neutron API: My concern is there is same as yours with #128: https://docs.openstack.org/nova/latest/contributor/project-scope.html#don-t-break-existing-users\n\nCurrently, an instance does have no other means to query data only about itself in an authenticated fashion. I could provision read-only openstack credentials (which by itself is already a bit cumbersome to do so safely), but I cannot restrict the credentials to be only about itself.\nSo, we go from \"curl http://.../latest/security-groups\" to a whole ballpark of tools.\n\nI\u0027d personally would not feel comfortable to roll that out to our user-base.","commit_id":"8526e465059b0418646fccc3d0454336670a1c68"},{"author":{"_account_id":24434,"name":"Fabian Wiesel","email":"fabian.wiesel@sap.com","username":"fwiesel"},"change_message_id":"7e2ed00f9dd99cecfde537f5ed0f1363c6fe8ba2","unresolved":true,"context_lines":[{"line_number":37,"context_line":""},{"line_number":38,"context_line":""},{"line_number":39,"context_line":"Regardless of which data has been requested, the nova-metadata process will"},{"line_number":40,"context_line":"first collect all data from its cells and from Neutron, thereby potentially"},{"line_number":41,"context_line":"causing unneccesary load on the Neutron and Nova services."},{"line_number":42,"context_line":""},{"line_number":43,"context_line":"The caching strategy implemented is caching the whole data and expiring it"}],"source_content_type":"text/x-rst","patch_set":5,"id":"76069cc7_5688783f","line":40,"in_reply_to":"3989d629_9c24dc7d","updated":"2024-06-25 17:32:42.000000000","message":"\u003e we do not clollect info form nuton by the way as far as im aware\n\nHere starts the call to neutron for the security groups:\n- https://opendev.org/openstack/nova/src/branch/master/nova/api/metadata/base.py#L150-L151\n\nwhich ends up here:\n\n- https://opendev.org/openstack/nova/src/branch/master/nova/network/security_group_api.py#L532-L574\n\n\u003e we also dont need to collect data form cells plural \n\nI\u0027ll change it to singular.","commit_id":"8526e465059b0418646fccc3d0454336670a1c68"},{"author":{"_account_id":11604,"name":"sean mooney","email":"smooney@redhat.com","username":"sean-k-mooney"},"change_message_id":"bf7719b072e3d58df675b3ef7a3fb4c745d36e04","unresolved":true,"context_lines":[{"line_number":37,"context_line":""},{"line_number":38,"context_line":""},{"line_number":39,"context_line":"Regardless of which data has been requested, the nova-metadata process will"},{"line_number":40,"context_line":"first collect all data from its cells and from Neutron, thereby potentially"},{"line_number":41,"context_line":"causing unneccesary load on the Neutron and Nova services."},{"line_number":42,"context_line":""},{"line_number":43,"context_line":"The caching strategy implemented is caching the whole data and expiring it"}],"source_content_type":"text/x-rst","patch_set":5,"id":"d61a31a1_51b9ae9d","line":40,"in_reply_to":"76069cc7_5688783f","updated":"2024-06-26 11:54:43.000000000","message":"ah so i tought we merged https://review.opendev.org/c/openstack/nova/+/786348\ntoo resolve https://bugs.launchpad.net/nova/+bug/1923560 some time ago but that stalled out so we are still not caching that.\n\ni woudl prefer to kill two brids with one stone and revive that i think to address the issue of load on Neutorn.","commit_id":"8526e465059b0418646fccc3d0454336670a1c68"},{"author":{"_account_id":11604,"name":"sean mooney","email":"smooney@redhat.com","username":"sean-k-mooney"},"change_message_id":"f6db79c0f74dfae7eb6456b44e081c6faf667e18","unresolved":true,"context_lines":[{"line_number":37,"context_line":""},{"line_number":38,"context_line":""},{"line_number":39,"context_line":"Regardless of which data has been requested, the nova-metadata process will"},{"line_number":40,"context_line":"first collect all data from its cells and from Neutron, thereby potentially"},{"line_number":41,"context_line":"causing unneccesary load on the Neutron and Nova services."},{"line_number":42,"context_line":""},{"line_number":43,"context_line":"The caching strategy implemented is caching the whole data and expiring it"}],"source_content_type":"text/x-rst","patch_set":5,"id":"1fd40050_c593955b","line":40,"in_reply_to":"d1778980_10c895be","updated":"2024-07-02 19:22:16.000000000","message":"you are not ment to ask nova for the security groups.\n\nthose are legay deprecated proxy apis.\n\napplications can get a cached view via metdat but if they want up to date info the should be quierying neturon directly.\n\nthis is directly covered by \nhttps://docs.openstack.org/nova/latest/contributor/project-scope.html#no-more-api-proxies\n\n```\nThe next API to mention is the networking APIs, in particular the security groups API. \nMost of these APIs exist from when nova-network existed and the proxies were added during the transition. \nHowever, security groups has a much richer Neutron API, and if you use both Nova API and Neutron API,\nthe mismatch can lead to some very unexpected results, in certain cases.\n\nOur intention is to avoid adding to the problems we already have in this area.\n```\n\nany networking information avaible via neutron apis should be considerd a cached view and not relied on to be up to date.\nthat applies even more so to the metadta api then the main api.","commit_id":"8526e465059b0418646fccc3d0454336670a1c68"},{"author":{"_account_id":24434,"name":"Fabian Wiesel","email":"fabian.wiesel@sap.com","username":"fwiesel"},"change_message_id":"3a9baa2b737cac8f9cd93104e83c3d06ae52ad5d","unresolved":true,"context_lines":[{"line_number":37,"context_line":""},{"line_number":38,"context_line":""},{"line_number":39,"context_line":"Regardless of which data has been requested, the nova-metadata process will"},{"line_number":40,"context_line":"first collect all data from its cells and from Neutron, thereby potentially"},{"line_number":41,"context_line":"causing unneccesary load on the Neutron and Nova services."},{"line_number":42,"context_line":""},{"line_number":43,"context_line":"The caching strategy implemented is caching the whole data and expiring it"}],"source_content_type":"text/x-rst","patch_set":5,"id":"d1778980_10c895be","line":40,"in_reply_to":"d61a31a1_51b9ae9d","updated":"2024-07-02 13:00:09.000000000","message":"I think, the proposal doesn\u0027t work by itself or at least not without having a potentially breaking change. For the same reasons of the discussed alternative:\n\nNeutron doesn\u0027t seem to send any notification out to nova in the event the security group membership of the port out.\nWithout it we would have to wait for `_heal_instance_info_cache` to eventually fix the inconsistencies of the data.","commit_id":"8526e465059b0418646fccc3d0454336670a1c68"},{"author":{"_account_id":11604,"name":"sean mooney","email":"smooney@redhat.com","username":"sean-k-mooney"},"change_message_id":"62027cfeeb7bd7eb8ce0ec5a33cf50e95bff5292","unresolved":true,"context_lines":[{"line_number":38,"context_line":""},{"line_number":39,"context_line":"Regardless of which data has been requested, the nova-metadata process will"},{"line_number":40,"context_line":"first collect all data from its cells and from Neutron, thereby potentially"},{"line_number":41,"context_line":"causing unneccesary load on the Neutron and Nova services."},{"line_number":42,"context_line":""},{"line_number":43,"context_line":"The caching strategy implemented is caching the whole data and expiring it"},{"line_number":44,"context_line":"after a fixed time period. That helps only when the requests within"}],"source_content_type":"text/x-rst","patch_set":5,"id":"bc233660_efabc1b0","line":41,"updated":"2024-06-25 16:29:18.000000000","message":"so there are reasons for this.\n\ncloud-init, the primary consumer of this data will onl yever retry the first query to get metadata\n\nit will not retry and subdiroctires once the first quey succeed and as far as im wawre you cannot easisly alther the timeout that cloud init uses without modifyign the guest image.\n\nwe have had downstream customer bug as a result of high load on the nova metadata api causign long responce times to metadta requests.\n\nwe solve that by using memcache to cache the metadata object so that when the secon  qeuery is made even if that is load blanced to a serpate api instnace the subsequent request will not time out.\n\nso we would only be able to make this change if we can garrentee that we wont time out generating the request. as such this would have to be an opt in feature.","commit_id":"8526e465059b0418646fccc3d0454336670a1c68"},{"author":{"_account_id":24434,"name":"Fabian Wiesel","email":"fabian.wiesel@sap.com","username":"fwiesel"},"change_message_id":"3a9baa2b737cac8f9cd93104e83c3d06ae52ad5d","unresolved":false,"context_lines":[{"line_number":38,"context_line":""},{"line_number":39,"context_line":"Regardless of which data has been requested, the nova-metadata process will"},{"line_number":40,"context_line":"first collect all data from its cells and from Neutron, thereby potentially"},{"line_number":41,"context_line":"causing unneccesary load on the Neutron and Nova services."},{"line_number":42,"context_line":""},{"line_number":43,"context_line":"The caching strategy implemented is caching the whole data and expiring it"},{"line_number":44,"context_line":"after a fixed time period. That helps only when the requests within"}],"source_content_type":"text/x-rst","patch_set":5,"id":"be2635b5_1bee1cc8","line":41,"in_reply_to":"b1d085aa_d3e385d5","updated":"2024-07-02 13:00:09.000000000","message":"Done","commit_id":"8526e465059b0418646fccc3d0454336670a1c68"},{"author":{"_account_id":24434,"name":"Fabian Wiesel","email":"fabian.wiesel@sap.com","username":"fwiesel"},"change_message_id":"7e2ed00f9dd99cecfde537f5ed0f1363c6fe8ba2","unresolved":true,"context_lines":[{"line_number":38,"context_line":""},{"line_number":39,"context_line":"Regardless of which data has been requested, the nova-metadata process will"},{"line_number":40,"context_line":"first collect all data from its cells and from Neutron, thereby potentially"},{"line_number":41,"context_line":"causing unneccesary load on the Neutron and Nova services."},{"line_number":42,"context_line":""},{"line_number":43,"context_line":"The caching strategy implemented is caching the whole data and expiring it"},{"line_number":44,"context_line":"after a fixed time period. That helps only when the requests within"}],"source_content_type":"text/x-rst","patch_set":5,"id":"b1d085aa_d3e385d5","line":41,"in_reply_to":"bc233660_efabc1b0","updated":"2024-06-25 17:32:42.000000000","message":"I general, it would make sense to make it an opt-in feature. I\u0027ll add that.\n\n\nI\u0027ll also read the cloud-init code, and adopt the proposal to take that behaviour into consideration (i.e. lazy-load only fields not currently read by cloud-init).","commit_id":"8526e465059b0418646fccc3d0454336670a1c68"},{"author":{"_account_id":11604,"name":"sean mooney","email":"smooney@redhat.com","username":"sean-k-mooney"},"change_message_id":"62027cfeeb7bd7eb8ce0ec5a33cf50e95bff5292","unresolved":true,"context_lines":[{"line_number":44,"context_line":"after a fixed time period. That helps only when the requests within"},{"line_number":45,"context_line":"the expiration time, such as when the VM quickly walks the whole tree. But as"},{"line_number":46,"context_line":"there is no cache invalidation on chance, setting this value too high increases"},{"line_number":47,"context_line":"the risk of the user getting outdated data."},{"line_number":48,"context_line":""},{"line_number":49,"context_line":"Use Cases"},{"line_number":50,"context_line":"---------"}],"source_content_type":"text/x-rst","patch_set":5,"id":"7d7c2120_422a4762","line":47,"updated":"2024-06-25 16:29:18.000000000","message":"right so i think the correct way to fix this would be to support  a way for the guest to invalidate the cache and rebuild the metadta\n\ni.e. a HTTP DELETE or other call to the root of the metadata api endpoint.","commit_id":"8526e465059b0418646fccc3d0454336670a1c68"},{"author":{"_account_id":4393,"name":"Dan Smith","email":"dms@danplanet.com","username":"danms"},"change_message_id":"86ead97a471529c8853c7c0058a63f02ee4dc11b","unresolved":true,"context_lines":[{"line_number":44,"context_line":"after a fixed time period. That helps only when the requests within"},{"line_number":45,"context_line":"the expiration time, such as when the VM quickly walks the whole tree. But as"},{"line_number":46,"context_line":"there is no cache invalidation on chance, setting this value too high increases"},{"line_number":47,"context_line":"the risk of the user getting outdated data."},{"line_number":48,"context_line":""},{"line_number":49,"context_line":"Use Cases"},{"line_number":50,"context_line":"---------"}],"source_content_type":"text/x-rst","patch_set":5,"id":"b88b5b4a_1976a499","line":47,"in_reply_to":"0ebf0f95_f13d22d6","updated":"2024-07-18 16:22:56.000000000","message":"I don\u0027t think the DELETE method is the right way to ask for cache invalidation. That seems highly confusing to me.\n\nOne way would be to use `Cache-Control` (or similar) on the request to indicate when we really need the latest stuff.","commit_id":"8526e465059b0418646fccc3d0454336670a1c68"},{"author":{"_account_id":11604,"name":"sean mooney","email":"smooney@redhat.com","username":"sean-k-mooney"},"change_message_id":"bf7719b072e3d58df675b3ef7a3fb4c745d36e04","unresolved":true,"context_lines":[{"line_number":44,"context_line":"after a fixed time period. That helps only when the requests within"},{"line_number":45,"context_line":"the expiration time, such as when the VM quickly walks the whole tree. But as"},{"line_number":46,"context_line":"there is no cache invalidation on chance, setting this value too high increases"},{"line_number":47,"context_line":"the risk of the user getting outdated data."},{"line_number":48,"context_line":""},{"line_number":49,"context_line":"Use Cases"},{"line_number":50,"context_line":"---------"}],"source_content_type":"text/x-rst","patch_set":5,"id":"ee27403f_b05b90f9","line":47,"in_reply_to":"60a04393_b5d321ef","updated":"2024-06-26 11:54:43.000000000","message":"the idea is that if they want fresh datat then can explictly invalidte the cache by doing a delete. i know there are also ways ot od this with i think a http options call. its been while since i looked at the stadards aroudn cache invalidation alotuh i know it not standarised properly.\n\nim open to the method but yes.\n\n\nthe metadata api can be cached to day and im not really in favor of changin that behavior even fi we were to lazy load we ould still cache the data in memcache if caching was enabled in the metadata api config.\n\nit would jsut cahnge the size of the cached data not the behavior.\n\nthe problem statment seems to resovle aroudn an inablity to refersh the cached data so if that is what you want to actuly solve then we should provide a way for the client to invalidate the cache.","commit_id":"8526e465059b0418646fccc3d0454336670a1c68"},{"author":{"_account_id":24434,"name":"Fabian Wiesel","email":"fabian.wiesel@sap.com","username":"fwiesel"},"change_message_id":"7e2ed00f9dd99cecfde537f5ed0f1363c6fe8ba2","unresolved":true,"context_lines":[{"line_number":44,"context_line":"after a fixed time period. That helps only when the requests within"},{"line_number":45,"context_line":"the expiration time, such as when the VM quickly walks the whole tree. But as"},{"line_number":46,"context_line":"there is no cache invalidation on chance, setting this value too high increases"},{"line_number":47,"context_line":"the risk of the user getting outdated data."},{"line_number":48,"context_line":""},{"line_number":49,"context_line":"Use Cases"},{"line_number":50,"context_line":"---------"}],"source_content_type":"text/x-rst","patch_set":5,"id":"60a04393_b5d321ef","line":47,"in_reply_to":"7d7c2120_422a4762","updated":"2024-06-25 17:32:42.000000000","message":"We do have limited control over the behaviour of clients.\nI assume, you want to suggest that as a way for the client to signal that is wants fresh data for a \"batch\" query?\n\nThe clients also have the option of using the metadata as json.","commit_id":"8526e465059b0418646fccc3d0454336670a1c68"},{"author":{"_account_id":24434,"name":"Fabian Wiesel","email":"fabian.wiesel@sap.com","username":"fwiesel"},"change_message_id":"3a9baa2b737cac8f9cd93104e83c3d06ae52ad5d","unresolved":true,"context_lines":[{"line_number":44,"context_line":"after a fixed time period. That helps only when the requests within"},{"line_number":45,"context_line":"the expiration time, such as when the VM quickly walks the whole tree. But as"},{"line_number":46,"context_line":"there is no cache invalidation on chance, setting this value too high increases"},{"line_number":47,"context_line":"the risk of the user getting outdated data."},{"line_number":48,"context_line":""},{"line_number":49,"context_line":"Use Cases"},{"line_number":50,"context_line":"---------"}],"source_content_type":"text/x-rst","patch_set":5,"id":"0ebf0f95_f13d22d6","line":47,"in_reply_to":"ee27403f_b05b90f9","updated":"2024-07-02 13:00:09.000000000","message":"\u003e the problem statment seems to resovle aroudn an inablity to refersh the cached data so if that is what you want to actuly solve then we should provide a way for the client to invalidate the cache\n\nThat was not the message I wanted to convey. In my mind the problem is load created by the read amplification (primarily caused by polling clients).\n\nAssuming they would implement it, giving the client to force a read without caching would make that problem worse, as then client would then do that on every request.","commit_id":"8526e465059b0418646fccc3d0454336670a1c68"},{"author":{"_account_id":11604,"name":"sean mooney","email":"smooney@redhat.com","username":"sean-k-mooney"},"change_message_id":"62027cfeeb7bd7eb8ce0ec5a33cf50e95bff5292","unresolved":true,"context_lines":[{"line_number":50,"context_line":"---------"},{"line_number":51,"context_line":"Additionally to the main use case of bootstrapping a VM with the metadata, End"},{"line_number":52,"context_line":"Users are also using the metadata service to poll various information. The"},{"line_number":53,"context_line":"software installed may be not specific to Openstack and poll metadata paths."},{"line_number":54,"context_line":""},{"line_number":55,"context_line":"The change is intended to be without requiring any change to the End Users."},{"line_number":56,"context_line":""}],"source_content_type":"text/x-rst","patch_set":5,"id":"2f25260d_b1ea0509","line":53,"updated":"2024-06-25 16:29:18.000000000","message":"if they dont pool the metadta paths then there is no load on the metadata api\n\nunless you use config drive this data is only generated in responce to an api request form the guest.","commit_id":"8526e465059b0418646fccc3d0454336670a1c68"},{"author":{"_account_id":24434,"name":"Fabian Wiesel","email":"fabian.wiesel@sap.com","username":"fwiesel"},"change_message_id":"7e2ed00f9dd99cecfde537f5ed0f1363c6fe8ba2","unresolved":true,"context_lines":[{"line_number":50,"context_line":"---------"},{"line_number":51,"context_line":"Additionally to the main use case of bootstrapping a VM with the metadata, End"},{"line_number":52,"context_line":"Users are also using the metadata service to poll various information. The"},{"line_number":53,"context_line":"software installed may be not specific to Openstack and poll metadata paths."},{"line_number":54,"context_line":""},{"line_number":55,"context_line":"The change is intended to be without requiring any change to the End Users."},{"line_number":56,"context_line":""}],"source_content_type":"text/x-rst","patch_set":5,"id":"882c8852_091c1f92","line":53,"in_reply_to":"2f25260d_b1ea0509","updated":"2024-06-25 17:32:42.000000000","message":"The point made here is, that bootstrapping is not the only use-case. The requests come more often than once a boot.","commit_id":"8526e465059b0418646fccc3d0454336670a1c68"},{"author":{"_account_id":11604,"name":"sean mooney","email":"smooney@redhat.com","username":"sean-k-mooney"},"change_message_id":"bf7719b072e3d58df675b3ef7a3fb4c745d36e04","unresolved":true,"context_lines":[{"line_number":50,"context_line":"---------"},{"line_number":51,"context_line":"Additionally to the main use case of bootstrapping a VM with the metadata, End"},{"line_number":52,"context_line":"Users are also using the metadata service to poll various information. The"},{"line_number":53,"context_line":"software installed may be not specific to Openstack and poll metadata paths."},{"line_number":54,"context_line":""},{"line_number":55,"context_line":"The change is intended to be without requiring any change to the End Users."},{"line_number":56,"context_line":""}],"source_content_type":"text/x-rst","patch_set":5,"id":"be202c2c_145280ad","line":53,"in_reply_to":"882c8852_091c1f92","updated":"2024-06-26 11:54:43.000000000","message":"yes it can but in general this is not an api that sees heavy load.\n\nthis is not intended ot be used as some kind of distributed key value store liek ectd or somialr to allow you to contol the behaviour of running application by updating metadata on the instance. so we do not expect workload to poll htis at a high interval.\n\nthe per instance load should be very low outside of reboot/spwan/move operations\n\ni woudl consider an aplication that is activlly pooling the metadta to be a bad actor and consider that a ddos.\n\nthat does nto mean we cant mitgate taht but its not how this is inteded to be used and the applciation is not coded correctly from my point of view.\n\nadding ratelimiting or similar would be another way to mitgate this load but the cacheign approch with memcachd is what most cloud do in production to aovid bad actors like this.","commit_id":"8526e465059b0418646fccc3d0454336670a1c68"},{"author":{"_account_id":24434,"name":"Fabian Wiesel","email":"fabian.wiesel@sap.com","username":"fwiesel"},"change_message_id":"3a9baa2b737cac8f9cd93104e83c3d06ae52ad5d","unresolved":true,"context_lines":[{"line_number":50,"context_line":"---------"},{"line_number":51,"context_line":"Additionally to the main use case of bootstrapping a VM with the metadata, End"},{"line_number":52,"context_line":"Users are also using the metadata service to poll various information. The"},{"line_number":53,"context_line":"software installed may be not specific to Openstack and poll metadata paths."},{"line_number":54,"context_line":""},{"line_number":55,"context_line":"The change is intended to be without requiring any change to the End Users."},{"line_number":56,"context_line":""}],"source_content_type":"text/x-rst","patch_set":5,"id":"558f1668_4372fe3e","line":53,"in_reply_to":"be202c2c_145280ad","updated":"2024-07-02 13:00:09.000000000","message":"We are not talking here about requests per seconds per instance, but X minutes per instance. We just happen to have many instances. I do not see that as an issue of the user of our infrastructure to fix that.\nWe can (and do) scale up our server instances to serve these requests. We just wish there would be less resources used.","commit_id":"8526e465059b0418646fccc3d0454336670a1c68"},{"author":{"_account_id":11604,"name":"sean mooney","email":"smooney@redhat.com","username":"sean-k-mooney"},"change_message_id":"62027cfeeb7bd7eb8ce0ec5a33cf50e95bff5292","unresolved":true,"context_lines":[{"line_number":52,"context_line":"Users are also using the metadata service to poll various information. The"},{"line_number":53,"context_line":"software installed may be not specific to Openstack and poll metadata paths."},{"line_number":54,"context_line":""},{"line_number":55,"context_line":"The change is intended to be without requiring any change to the End Users."},{"line_number":56,"context_line":""},{"line_number":57,"context_line":"Proposed change"},{"line_number":58,"context_line":"\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d"}],"source_content_type":"text/x-rst","patch_set":5,"id":"8133293d_2ce9c77f","line":55,"updated":"2024-06-25 16:29:18.000000000","message":"on lower performaing systems this is likely ot break existing worklaod baded on our previous obseved behaiovr with cloud-init.\n\nunless that has chagned cloud init as shipped by redhat and cannonical only retries the first metadata api request.\nif any of the rest time out such as when the api server is under load then cloud init will fail to bootsrap the vm.\n\ndepending on where that happen that might require the vm to be rebooted or rebuilt to fix it.","commit_id":"8526e465059b0418646fccc3d0454336670a1c68"},{"author":{"_account_id":24434,"name":"Fabian Wiesel","email":"fabian.wiesel@sap.com","username":"fwiesel"},"change_message_id":"e3f6c2737c8b63d30d5b7bcd008b5c8e2789eb44","unresolved":false,"context_lines":[{"line_number":52,"context_line":"Users are also using the metadata service to poll various information. The"},{"line_number":53,"context_line":"software installed may be not specific to Openstack and poll metadata paths."},{"line_number":54,"context_line":""},{"line_number":55,"context_line":"The change is intended to be without requiring any change to the End Users."},{"line_number":56,"context_line":""},{"line_number":57,"context_line":"Proposed change"},{"line_number":58,"context_line":"\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d"}],"source_content_type":"text/x-rst","patch_set":5,"id":"90f158ee_431789be","line":55,"in_reply_to":"017c8126_26e2180b","updated":"2024-07-03 10:41:45.000000000","message":"Done","commit_id":"8526e465059b0418646fccc3d0454336670a1c68"},{"author":{"_account_id":11604,"name":"sean mooney","email":"smooney@redhat.com","username":"sean-k-mooney"},"change_message_id":"bf7719b072e3d58df675b3ef7a3fb4c745d36e04","unresolved":true,"context_lines":[{"line_number":52,"context_line":"Users are also using the metadata service to poll various information. The"},{"line_number":53,"context_line":"software installed may be not specific to Openstack and poll metadata paths."},{"line_number":54,"context_line":""},{"line_number":55,"context_line":"The change is intended to be without requiring any change to the End Users."},{"line_number":56,"context_line":""},{"line_number":57,"context_line":"Proposed change"},{"line_number":58,"context_line":"\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d"}],"source_content_type":"text/x-rst","patch_set":5,"id":"bc41f94a_cafd16dd","line":55,"in_reply_to":"08442bff_b92b03fa","updated":"2024-06-26 11:54:43.000000000","message":"im saying the lazy loading you are proposing would likely break the primary exisitng clint for this api (cloud-init) and woudl driectly require end user changes ot work around.\n\nso i commented here befause i dont belive the spec achive the goal of no end user changes.\n\nthis to me is a semantic change that may be end user visable depending on\nhow this is implemented and how nova is configured.(with or without memcache)\n\nso yes its related to 41","commit_id":"8526e465059b0418646fccc3d0454336670a1c68"},{"author":{"_account_id":24434,"name":"Fabian Wiesel","email":"fabian.wiesel@sap.com","username":"fwiesel"},"change_message_id":"7e2ed00f9dd99cecfde537f5ed0f1363c6fe8ba2","unresolved":true,"context_lines":[{"line_number":52,"context_line":"Users are also using the metadata service to poll various information. The"},{"line_number":53,"context_line":"software installed may be not specific to Openstack and poll metadata paths."},{"line_number":54,"context_line":""},{"line_number":55,"context_line":"The change is intended to be without requiring any change to the End Users."},{"line_number":56,"context_line":""},{"line_number":57,"context_line":"Proposed change"},{"line_number":58,"context_line":"\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d"}],"source_content_type":"text/x-rst","patch_set":5,"id":"08442bff_b92b03fa","line":55,"in_reply_to":"8133293d_2ce9c77f","updated":"2024-06-25 17:32:42.000000000","message":"Am I missing something, or is it #41?","commit_id":"8526e465059b0418646fccc3d0454336670a1c68"},{"author":{"_account_id":24434,"name":"Fabian Wiesel","email":"fabian.wiesel@sap.com","username":"fwiesel"},"change_message_id":"3a9baa2b737cac8f9cd93104e83c3d06ae52ad5d","unresolved":true,"context_lines":[{"line_number":52,"context_line":"Users are also using the metadata service to poll various information. The"},{"line_number":53,"context_line":"software installed may be not specific to Openstack and poll metadata paths."},{"line_number":54,"context_line":""},{"line_number":55,"context_line":"The change is intended to be without requiring any change to the End Users."},{"line_number":56,"context_line":""},{"line_number":57,"context_line":"Proposed change"},{"line_number":58,"context_line":"\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d"}],"source_content_type":"text/x-rst","patch_set":5,"id":"017c8126_26e2180b","line":55,"in_reply_to":"bc41f94a_cafd16dd","updated":"2024-07-02 13:00:09.000000000","message":"So, I\u0027ve read now the code from cloud-init, (and updated the spec accordingly).\nThere is a function for waiting for the metadata server by sending a GET request to `/openstack`, but it is actually more sensitive to timeouts then any subsequent request for openstack by default:\n\nIt will do a single request with the same time-out (5s by default) as all other requests with no-retry.\nIt can be configured to retry for a time-period, but the default for openstack is not to.\nIn contrast, all later requests are with five retries.\n\n\nHere the more detailed flow:\n\nDefaults: https://github.com/canonical/cloud-init/blob/main/cloudinit/sources/__init__.py#L223-L225\n1. wait_for_metadata (https://github.com/canonical/cloud-init/blob/main/cloudinit/sources/DataSourceOpenStack.py#L74C9-L74C20)\n      max_wait\u003dint(self.ds_cfg.get(\"max_wait\", self.url_max_wait\u003d-1)) See: https://cloudinit.readthedocs.io/en/latest/reference/datasources/openstack.html#max-wait\n      timeout\u003dint(self.ds_cfg.get(\"timeout\", self.url_timeout))\n  openstack\n3. read_v2  (https://github.com/canonical/cloud-init/blob/main/cloudinit/sources/__init__.py#L223-L225)\n      retries\u003dint(self.ds_cfg.get(\"retries\", self.url_retries))\n      timeout\u003durl_params.timeout_seconds\n  paths:\n    openstack (MetadataReader._fetch_available_versions https://github.com/canonical/cloud-init/blob/main/cloudinit/sources/helpers/openstack.py#L485-L492)\n    openstack/\u003cversion\u003e/meta_data.json\n    openstack/\u003cversion\u003e/user_data\n    openstack/\u003cversion\u003e/vendor_data.json\n    openstack/\u003cversion\u003e/vendor_data2.json\n    openstack/\u003cversion\u003e/network_data.json\n3.  MetadataReader._read_ec2_metadata https://github.com/canonical/cloud-init/blob/main/cloudinit/sources/helpers/openstack.py#L346\n  latest/metadata/\n    recursive https://github.com/canonical/cloud-init/blob/main/cloudinit/sources/helpers/ec2.py#L54","commit_id":"8526e465059b0418646fccc3d0454336670a1c68"},{"author":{"_account_id":11604,"name":"sean mooney","email":"smooney@redhat.com","username":"sean-k-mooney"},"change_message_id":"62027cfeeb7bd7eb8ce0ec5a33cf50e95bff5292","unresolved":true,"context_lines":[{"line_number":71,"context_line":"``/openstack/\u003cversion\u003e/meta_data.json``."},{"line_number":72,"context_line":""},{"line_number":73,"context_line":"But such a change would require broader changes, as such cache invalidation"},{"line_number":74,"context_line":"has to be handled across services."},{"line_number":75,"context_line":""},{"line_number":76,"context_line":"The proposed change is limited to a single process, and also doesn\u0027t exclude"},{"line_number":77,"context_line":"cache invalidation."}],"source_content_type":"text/x-rst","patch_set":5,"id":"4fb9622e_8dfc0516","line":74,"updated":"2024-06-25 16:29:18.000000000","message":"honestly i think this should be the first approch we consider and lazyloading shoudl be the alternitive.","commit_id":"8526e465059b0418646fccc3d0454336670a1c68"},{"author":{"_account_id":24434,"name":"Fabian Wiesel","email":"fabian.wiesel@sap.com","username":"fwiesel"},"change_message_id":"7e2ed00f9dd99cecfde537f5ed0f1363c6fe8ba2","unresolved":false,"context_lines":[{"line_number":71,"context_line":"``/openstack/\u003cversion\u003e/meta_data.json``."},{"line_number":72,"context_line":""},{"line_number":73,"context_line":"But such a change would require broader changes, as such cache invalidation"},{"line_number":74,"context_line":"has to be handled across services."},{"line_number":75,"context_line":""},{"line_number":76,"context_line":"The proposed change is limited to a single process, and also doesn\u0027t exclude"},{"line_number":77,"context_line":"cache invalidation."}],"source_content_type":"text/x-rst","patch_set":5,"id":"35f0794c_5d773da9","line":74,"in_reply_to":"4fb9622e_8dfc0516","updated":"2024-06-25 17:32:42.000000000","message":"Acknowledged","commit_id":"8526e465059b0418646fccc3d0454336670a1c68"},{"author":{"_account_id":11604,"name":"sean mooney","email":"smooney@redhat.com","username":"sean-k-mooney"},"change_message_id":"62027cfeeb7bd7eb8ce0ec5a33cf50e95bff5292","unresolved":true,"context_lines":[{"line_number":278,"context_line":"     - Introduced"},{"line_number":279,"context_line":""},{"line_number":280,"context_line":""},{"line_number":281,"context_line":".. _Nova_Dalmatian_vPTG: https://etherpad.opendev.org/p/nova-dalmatian-ptg#L342"}],"source_content_type":"text/x-rst","patch_set":5,"id":"11e8eaa1_c2e721c1","line":281,"updated":"2024-06-25 16:29:18.000000000","message":"i was not present for that ptg discussion but when i did join later i stated i was not in favor of this proposed direction.,","commit_id":"8526e465059b0418646fccc3d0454336670a1c68"},{"author":{"_account_id":24434,"name":"Fabian Wiesel","email":"fabian.wiesel@sap.com","username":"fwiesel"},"change_message_id":"7e2ed00f9dd99cecfde537f5ed0f1363c6fe8ba2","unresolved":false,"context_lines":[{"line_number":278,"context_line":"     - Introduced"},{"line_number":279,"context_line":""},{"line_number":280,"context_line":""},{"line_number":281,"context_line":".. _Nova_Dalmatian_vPTG: https://etherpad.opendev.org/p/nova-dalmatian-ptg#L342"}],"source_content_type":"text/x-rst","patch_set":5,"id":"4fe48ae8_ab129fc1","line":281,"in_reply_to":"11e8eaa1_c2e721c1","updated":"2024-06-25 17:32:42.000000000","message":"Acknowledged","commit_id":"8526e465059b0418646fccc3d0454336670a1c68"},{"author":{"_account_id":11604,"name":"sean mooney","email":"smooney@redhat.com","username":"sean-k-mooney"},"change_message_id":"f39b5df1a6940aadee90b1aaa8a5a332d97cc703","unresolved":true,"context_lines":[{"line_number":34,"context_line":""},{"line_number":35,"context_line":"  - AWS: ``/\u003cversion\u003e/dynamic/instance-identity/document``"},{"line_number":36,"context_line":"  - Azure: ``/metadata/instance/compute``"},{"line_number":37,"context_line":""},{"line_number":38,"context_line":""},{"line_number":39,"context_line":"Regardless of which data has been requested, the nova-metadata process will"},{"line_number":40,"context_line":"first collect all data from the respective cell and from Neutron, thereby"}],"source_content_type":"text/x-rst","patch_set":7,"id":"9f209816_ca193dca","line":37,"updated":"2024-07-02 12:41:05.000000000","message":"there is no standard for metadata by the way so we do not expect to be compatiable with Azure or AWS in general.","commit_id":"abadb7e52d2fb529c73303258d423be48802c436"},{"author":{"_account_id":24434,"name":"Fabian Wiesel","email":"fabian.wiesel@sap.com","username":"fwiesel"},"change_message_id":"5bde13bbd0103827ebe7164f6423b972d3183700","unresolved":false,"context_lines":[{"line_number":34,"context_line":""},{"line_number":35,"context_line":"  - AWS: ``/\u003cversion\u003e/dynamic/instance-identity/document``"},{"line_number":36,"context_line":"  - Azure: ``/metadata/instance/compute``"},{"line_number":37,"context_line":""},{"line_number":38,"context_line":""},{"line_number":39,"context_line":"Regardless of which data has been requested, the nova-metadata process will"},{"line_number":40,"context_line":"first collect all data from the respective cell and from Neutron, thereby"}],"source_content_type":"text/x-rst","patch_set":7,"id":"41e14255_bf4b4519","line":37,"in_reply_to":"9f209816_ca193dca","updated":"2024-07-02 16:02:39.000000000","message":"Acknowledged","commit_id":"abadb7e52d2fb529c73303258d423be48802c436"},{"author":{"_account_id":11604,"name":"sean mooney","email":"smooney@redhat.com","username":"sean-k-mooney"},"change_message_id":"f39b5df1a6940aadee90b1aaa8a5a332d97cc703","unresolved":true,"context_lines":[{"line_number":62,"context_line":""},{"line_number":63,"context_line":"Instead of loading all the data up front, we delay the loading of the data"},{"line_number":64,"context_line":"when it is needed. This will be a configurable opt-in feature."},{"line_number":65,"context_line":""},{"line_number":66,"context_line":""},{"line_number":67,"context_line":"Alternatives"},{"line_number":68,"context_line":"------------"}],"source_content_type":"text/x-rst","patch_set":7,"id":"d80d6356_c4564502","line":65,"updated":"2024-07-02 12:41:05.000000000","message":"you need to descibe how you will do this.\nalso when this feature is enable we still need to suport config drive\nso while this could be confiugable at the metadta api we still need to be abel to genreate all the data when buildign the config drive.","commit_id":"abadb7e52d2fb529c73303258d423be48802c436"},{"author":{"_account_id":24434,"name":"Fabian Wiesel","email":"fabian.wiesel@sap.com","username":"fwiesel"},"change_message_id":"5bde13bbd0103827ebe7164f6423b972d3183700","unresolved":false,"context_lines":[{"line_number":62,"context_line":""},{"line_number":63,"context_line":"Instead of loading all the data up front, we delay the loading of the data"},{"line_number":64,"context_line":"when it is needed. This will be a configurable opt-in feature."},{"line_number":65,"context_line":""},{"line_number":66,"context_line":""},{"line_number":67,"context_line":"Alternatives"},{"line_number":68,"context_line":"------------"}],"source_content_type":"text/x-rst","patch_set":7,"id":"b5c7e81e_86ce04e2","line":65,"in_reply_to":"d80d6356_c4564502","updated":"2024-07-02 16:02:39.000000000","message":"Done","commit_id":"abadb7e52d2fb529c73303258d423be48802c436"},{"author":{"_account_id":11604,"name":"sean mooney","email":"smooney@redhat.com","username":"sean-k-mooney"},"change_message_id":"f39b5df1a6940aadee90b1aaa8a5a332d97cc703","unresolved":true,"context_lines":[{"line_number":121,"context_line":""},{"line_number":122,"context_line":"In its default configuration (_Cloud_Init_Defaults), Cloud-init starts with a"},{"line_number":123,"context_line":"single call to the ``/openstack`` path with a default ``timeout\u003d10``, with no"},{"line_number":124,"context_line":"retries as defined by ``max_wait\u003d-1``."},{"line_number":125,"context_line":""},{"line_number":126,"context_line":"Lazy loading will ensure that this initial wait call will be more likely to"},{"line_number":127,"context_line":"succeed as no external API calls are needed.  The timeout will also less likely"}],"source_content_type":"text/x-rst","patch_set":7,"id":"6c835536_3114ec66","line":124,"updated":"2024-07-02 12:41:05.000000000","message":"that is not the frist call.\n\nthe first call to the metadata endpoint is to the root \u0027/\u0027to figure out which data souce is in used. that is the one that is retried then once that succeed it start lokign at the openstack specific paths.\n\nbecuase we pregenerate the entire document the subsequent call to /openstack is more likely to succeed\n\nif we laze load /openstack after that first cal it will be less likely to succseed without a retry.","commit_id":"abadb7e52d2fb529c73303258d423be48802c436"},{"author":{"_account_id":24434,"name":"Fabian Wiesel","email":"fabian.wiesel@sap.com","username":"fwiesel"},"change_message_id":"5bde13bbd0103827ebe7164f6423b972d3183700","unresolved":true,"context_lines":[{"line_number":121,"context_line":""},{"line_number":122,"context_line":"In its default configuration (_Cloud_Init_Defaults), Cloud-init starts with a"},{"line_number":123,"context_line":"single call to the ``/openstack`` path with a default ``timeout\u003d10``, with no"},{"line_number":124,"context_line":"retries as defined by ``max_wait\u003d-1``."},{"line_number":125,"context_line":""},{"line_number":126,"context_line":"Lazy loading will ensure that this initial wait call will be more likely to"},{"line_number":127,"context_line":"succeed as no external API calls are needed.  The timeout will also less likely"}],"source_content_type":"text/x-rst","patch_set":7,"id":"c0ba57ac_1cd1ff41","line":124,"in_reply_to":"6c835536_3114ec66","updated":"2024-07-02 16:02:39.000000000","message":"Can you point me where that is supposed to happen?\n\nI can only find the `_check_and_get_data` method:\n   https://github.com/canonical/cloud-init/blob/main/cloudinit/sources/__init__.py#L408-L420\nand the decect uses local information here:\n   https://github.com/canonical/cloud-init/blob/main/cloudinit/sources/DataSourceOpenStack.py#L250-L265\nAnd from there on:\n  - _get_data: https://github.com/canonical/cloud-init/blob/main/cloudinit/sources/DataSourceOpenStack.py#L150\n  - _crawl_metadata: https://github.com/canonical/cloud-init/blob/main/cloudinit/sources/DataSourceOpenStack.py#L207C9-L207C24\n  - wait_for_metadata_service: https://github.com/canonical/cloud-init/blob/main/cloudinit/sources/DataSourceOpenStack.py#L74\n  \nI do not see any call to `/` along the call-graph.","commit_id":"abadb7e52d2fb529c73303258d423be48802c436"},{"author":{"_account_id":11604,"name":"sean mooney","email":"smooney@redhat.com","username":"sean-k-mooney"},"change_message_id":"f39b5df1a6940aadee90b1aaa8a5a332d97cc703","unresolved":true,"context_lines":[{"line_number":125,"context_line":""},{"line_number":126,"context_line":"Lazy loading will ensure that this initial wait call will be more likely to"},{"line_number":127,"context_line":"succeed as no external API calls are needed.  The timeout will also less likely"},{"line_number":128,"context_line":"be exceeded as the work to be done is significantly less."},{"line_number":129,"context_line":""},{"line_number":130,"context_line":"Subsequent data request are governed by the same ``timeout`` value per request,"},{"line_number":131,"context_line":"but also retry based on the ``retries\u003d5`` value."}],"source_content_type":"text/x-rst","patch_set":7,"id":"7e2a59ca_ae3abca2","line":128,"updated":"2024-07-02 12:41:05.000000000","message":"that is not consitent to what we see in practice.\n\nthe first call is not to /openstack","commit_id":"abadb7e52d2fb529c73303258d423be48802c436"},{"author":{"_account_id":11604,"name":"sean mooney","email":"smooney@redhat.com","username":"sean-k-mooney"},"change_message_id":"f6db79c0f74dfae7eb6456b44e081c6faf667e18","unresolved":true,"context_lines":[{"line_number":125,"context_line":""},{"line_number":126,"context_line":"Lazy loading will ensure that this initial wait call will be more likely to"},{"line_number":127,"context_line":"succeed as no external API calls are needed.  The timeout will also less likely"},{"line_number":128,"context_line":"be exceeded as the work to be done is significantly less."},{"line_number":129,"context_line":""},{"line_number":130,"context_line":"Subsequent data request are governed by the same ``timeout`` value per request,"},{"line_number":131,"context_line":"but also retry based on the ``retries\u003d5`` value."}],"source_content_type":"text/x-rst","patch_set":7,"id":"2bb02fc2_15b00daa","line":128,"in_reply_to":"04b34d52_7ea32632","updated":"2024-07-02 19:22:16.000000000","message":"actully we normally use cirros in ci which previously used cloud-init but now has its own implementation \n\nit supprots a number of datasources including configdrive and ec2 but not openstack sepcificically  it uses the ec2 datasouce to interact with novas  api.\n\nit poll http://169.254.169.254/2009-04-04/instance-id as the first request as a result.\n\nchecking http://169.254.169.254/2009-04-04/instance-id\nfailed 1/20: up 23.10. request failed\nfailed 2/20: up 25.57. request failed\nfailed 3/20: up 27.85. request failed\nfailed 4/20: up 30.14. request failed\nfailed 5/20: up 32.44. request failed\nsuccessful after 6/20 tries: up 34.73. iid\u003di-0000005c\n\n\nhttps://github.com/cirros-dev/cirros/blob/46a1162787f669ad8d6065cb6bbe477654b4327f/src/lib/cirros/ds/ec2#L39-L56\n\n\nthe DataSourceOpenStack in cloud init may poll  /openstack as you noted but we cant break ohter clients that use our ec2 compat code.\n\nopendev also support a simialr lightwieht alternitive called glean\n\nhttps://opendev.org/opendev/glean\nalthough i belive that only supports config drive.","commit_id":"abadb7e52d2fb529c73303258d423be48802c436"},{"author":{"_account_id":24434,"name":"Fabian Wiesel","email":"fabian.wiesel@sap.com","username":"fwiesel"},"change_message_id":"e3f6c2737c8b63d30d5b7bcd008b5c8e2789eb44","unresolved":true,"context_lines":[{"line_number":125,"context_line":""},{"line_number":126,"context_line":"Lazy loading will ensure that this initial wait call will be more likely to"},{"line_number":127,"context_line":"succeed as no external API calls are needed.  The timeout will also less likely"},{"line_number":128,"context_line":"be exceeded as the work to be done is significantly less."},{"line_number":129,"context_line":""},{"line_number":130,"context_line":"Subsequent data request are governed by the same ``timeout`` value per request,"},{"line_number":131,"context_line":"but also retry based on the ``retries\u003d5`` value."}],"source_content_type":"text/x-rst","patch_set":7,"id":"ebc70112_48f97f19","line":128,"in_reply_to":"2bb02fc2_15b00daa","updated":"2024-07-03 10:41:45.000000000","message":"Okay, I\u0027ll can check those too. But from a first glance at cirros, I\u0027d say it would actually benefit from the change even more, since it does only query a subset of the data.\n\nCirros has a timeout of 10s and 3 retries for each individual request: https://github.com/cirros-dev/cirros/blob/46a1162787f669ad8d6065cb6bbe477654b4327f/src/usr/bin/ec2metadata#L5-L6\nThe initial wait seems to me more of waiting that we actually have a working route to the metadata service.\n\nI cannot find any http requests in glean.\n\nI also checked https://github.com/cloudbase/cloudbase-init, they also have a multi-second timeout and retry for each of the request paths.\n\nBut considering we are now making it all opt-in, I am wondering where the need comes from to prove that the change is zero impact.\nIf it is just about the assertion that it is, I have changed it and listed the caveat of the changed timing behavior.","commit_id":"abadb7e52d2fb529c73303258d423be48802c436"},{"author":{"_account_id":24434,"name":"Fabian Wiesel","email":"fabian.wiesel@sap.com","username":"fwiesel"},"change_message_id":"5bde13bbd0103827ebe7164f6423b972d3183700","unresolved":true,"context_lines":[{"line_number":125,"context_line":""},{"line_number":126,"context_line":"Lazy loading will ensure that this initial wait call will be more likely to"},{"line_number":127,"context_line":"succeed as no external API calls are needed.  The timeout will also less likely"},{"line_number":128,"context_line":"be exceeded as the work to be done is significantly less."},{"line_number":129,"context_line":""},{"line_number":130,"context_line":"Subsequent data request are governed by the same ``timeout`` value per request,"},{"line_number":131,"context_line":"but also retry based on the ``retries\u003d5`` value."}],"source_content_type":"text/x-rst","patch_set":7,"id":"04b34d52_7ea32632","line":128,"in_reply_to":"7e2a59ca_ae3abca2","updated":"2024-07-02 16:02:39.000000000","message":"I\u0027ll try to reproduce it. But I think the code for the `DataSourceOpenStack` is quite clear.\nI suspect, it comes from another datasource.","commit_id":"abadb7e52d2fb529c73303258d423be48802c436"},{"author":{"_account_id":11604,"name":"sean mooney","email":"smooney@redhat.com","username":"sean-k-mooney"},"change_message_id":"f39b5df1a6940aadee90b1aaa8a5a332d97cc703","unresolved":true,"context_lines":[{"line_number":153,"context_line":""},{"line_number":154,"context_line":"The expected outcome of the optimization is a reduced load on the"},{"line_number":155,"context_line":"nova-metadata API service, and the neutron metadata service, if the feature"},{"line_number":156,"context_line":"is enabled."},{"line_number":157,"context_line":""},{"line_number":158,"context_line":"Other deployer impact"},{"line_number":159,"context_line":"---------------------"}],"source_content_type":"text/x-rst","patch_set":7,"id":"6147e3d3_dd4df1e5","line":156,"updated":"2024-07-02 12:41:05.000000000","message":"it may but i dont think we can claim that it will without benach marks to prove it\nso we shoudl not claim that it will in any documetaiotn or config optiosn.","commit_id":"abadb7e52d2fb529c73303258d423be48802c436"},{"author":{"_account_id":24434,"name":"Fabian Wiesel","email":"fabian.wiesel@sap.com","username":"fwiesel"},"change_message_id":"5bde13bbd0103827ebe7164f6423b972d3183700","unresolved":false,"context_lines":[{"line_number":153,"context_line":""},{"line_number":154,"context_line":"The expected outcome of the optimization is a reduced load on the"},{"line_number":155,"context_line":"nova-metadata API service, and the neutron metadata service, if the feature"},{"line_number":156,"context_line":"is enabled."},{"line_number":157,"context_line":""},{"line_number":158,"context_line":"Other deployer impact"},{"line_number":159,"context_line":"---------------------"}],"source_content_type":"text/x-rst","patch_set":7,"id":"2c1ede17_0673b753","line":156,"in_reply_to":"6147e3d3_dd4df1e5","updated":"2024-07-02 16:02:39.000000000","message":"Acknowledged","commit_id":"abadb7e52d2fb529c73303258d423be48802c436"},{"author":{"_account_id":11604,"name":"sean mooney","email":"smooney@redhat.com","username":"sean-k-mooney"},"change_message_id":"f39b5df1a6940aadee90b1aaa8a5a332d97cc703","unresolved":true,"context_lines":[{"line_number":159,"context_line":"---------------------"},{"line_number":160,"context_line":""},{"line_number":161,"context_line":"The deployer would need to enable the setting for the nova-metadata API"},{"line_number":162,"context_line":"service, if they want to make use of the feature."},{"line_number":163,"context_line":""},{"line_number":164,"context_line":""},{"line_number":165,"context_line":"Developer impact"}],"source_content_type":"text/x-rst","patch_set":7,"id":"54d10ee4_8dd4ab57","line":162,"updated":"2024-07-02 12:41:05.000000000","message":"there will need to be a new config option added to enabel the opt in beahvior.","commit_id":"abadb7e52d2fb529c73303258d423be48802c436"},{"author":{"_account_id":24434,"name":"Fabian Wiesel","email":"fabian.wiesel@sap.com","username":"fwiesel"},"change_message_id":"5bde13bbd0103827ebe7164f6423b972d3183700","unresolved":false,"context_lines":[{"line_number":159,"context_line":"---------------------"},{"line_number":160,"context_line":""},{"line_number":161,"context_line":"The deployer would need to enable the setting for the nova-metadata API"},{"line_number":162,"context_line":"service, if they want to make use of the feature."},{"line_number":163,"context_line":""},{"line_number":164,"context_line":""},{"line_number":165,"context_line":"Developer impact"}],"source_content_type":"text/x-rst","patch_set":7,"id":"b50a84c8_ae24b7a6","line":162,"in_reply_to":"54d10ee4_8dd4ab57","updated":"2024-07-02 16:02:39.000000000","message":"Done","commit_id":"abadb7e52d2fb529c73303258d423be48802c436"},{"author":{"_account_id":4393,"name":"Dan Smith","email":"dms@danplanet.com","username":"danms"},"change_message_id":"86ead97a471529c8853c7c0058a63f02ee4dc11b","unresolved":true,"context_lines":[{"line_number":15,"context_line":"Requesting data from the metadata server will query Neutron for all related"},{"line_number":16,"context_line":"ports and Nova internally all related data which potentially could be needed."},{"line_number":17,"context_line":"The request often requires only a subset of the data. The existing caching"},{"line_number":18,"context_line":"mechanism has no cache invalidation, therefor only allows a trade-off with"},{"line_number":19,"context_line":"accuracy."},{"line_number":20,"context_line":""},{"line_number":21,"context_line":""}],"source_content_type":"text/x-rst","patch_set":9,"id":"de9017b2_d4bc3b3c","line":18,"range":{"start_line":18,"start_character":37,"end_line":18,"end_character":45},"updated":"2024-07-18 16:22:56.000000000","message":"\"therefore\"","commit_id":"362bf84f5f0f6f477b91e3fc92b5c7431f1bb7d5"},{"author":{"_account_id":24434,"name":"Fabian Wiesel","email":"fabian.wiesel@sap.com","username":"fwiesel"},"change_message_id":"7fa72575e0cd2111d080c11e0dcebe63fbe50251","unresolved":false,"context_lines":[{"line_number":15,"context_line":"Requesting data from the metadata server will query Neutron for all related"},{"line_number":16,"context_line":"ports and Nova internally all related data which potentially could be needed."},{"line_number":17,"context_line":"The request often requires only a subset of the data. The existing caching"},{"line_number":18,"context_line":"mechanism has no cache invalidation, therefor only allows a trade-off with"},{"line_number":19,"context_line":"accuracy."},{"line_number":20,"context_line":""},{"line_number":21,"context_line":""}],"source_content_type":"text/x-rst","patch_set":9,"id":"baec5047_1e2090e3","line":18,"range":{"start_line":18,"start_character":37,"end_line":18,"end_character":45},"in_reply_to":"de9017b2_d4bc3b3c","updated":"2024-07-19 09:13:53.000000000","message":"Done","commit_id":"362bf84f5f0f6f477b91e3fc92b5c7431f1bb7d5"},{"author":{"_account_id":4393,"name":"Dan Smith","email":"dms@danplanet.com","username":"danms"},"change_message_id":"86ead97a471529c8853c7c0058a63f02ee4dc11b","unresolved":true,"context_lines":[{"line_number":54,"context_line":""},{"line_number":55,"context_line":"The change is intended to be without requiring any change to the End Users."},{"line_number":56,"context_line":"In particular, attention has to be payed to not require any changes to"},{"line_number":57,"context_line":"cloud-init."},{"line_number":58,"context_line":""},{"line_number":59,"context_line":""},{"line_number":60,"context_line":"Proposed change"}],"source_content_type":"text/x-rst","patch_set":9,"id":"82ea1e46_c16b8978","line":57,"updated":"2024-07-18 16:22:56.000000000","message":"I mean, I guess, but that probably means that we\u0027re just going to be guessing about what it wants and when...","commit_id":"362bf84f5f0f6f477b91e3fc92b5c7431f1bb7d5"},{"author":{"_account_id":24434,"name":"Fabian Wiesel","email":"fabian.wiesel@sap.com","username":"fwiesel"},"change_message_id":"7fa72575e0cd2111d080c11e0dcebe63fbe50251","unresolved":true,"context_lines":[{"line_number":54,"context_line":""},{"line_number":55,"context_line":"The change is intended to be without requiring any change to the End Users."},{"line_number":56,"context_line":"In particular, attention has to be payed to not require any changes to"},{"line_number":57,"context_line":"cloud-init."},{"line_number":58,"context_line":""},{"line_number":59,"context_line":""},{"line_number":60,"context_line":"Proposed change"}],"source_content_type":"text/x-rst","patch_set":9,"id":"61b348ff_511cbd30","line":57,"in_reply_to":"82ea1e46_c16b8978","updated":"2024-07-19 09:13:53.000000000","message":"I don\u0027t follow, how is it answering requests as they come \"guessing\"?\n\nThere was a concern raised that the proposal would break cloud-init due to relying on some implicit behavior. I went through the code, and read it and cross checked that it won\u0027t. I\u0027ve added a section to the document that details the behavior.","commit_id":"362bf84f5f0f6f477b91e3fc92b5c7431f1bb7d5"},{"author":{"_account_id":4393,"name":"Dan Smith","email":"dms@danplanet.com","username":"danms"},"change_message_id":"86ead97a471529c8853c7c0058a63f02ee4dc11b","unresolved":true,"context_lines":[{"line_number":61,"context_line":"\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d"},{"line_number":62,"context_line":""},{"line_number":63,"context_line":"Instead of loading all the data up front, we delay the loading of the data"},{"line_number":64,"context_line":"when it is needed. This will be a configurable opt-in feature."},{"line_number":65,"context_line":""},{"line_number":66,"context_line":"- Extend test coverage for metadata to ensure functional equivalency"},{"line_number":67,"context_line":"  (See Testing)"}],"source_content_type":"text/x-rst","patch_set":9,"id":"c6c0ba7d_ef0b296f","line":64,"updated":"2024-07-18 16:22:56.000000000","message":"That increases load (and the potential for DoS) though right? Feels like sacrificing one thing for the benefit of the other.","commit_id":"362bf84f5f0f6f477b91e3fc92b5c7431f1bb7d5"},{"author":{"_account_id":24434,"name":"Fabian Wiesel","email":"fabian.wiesel@sap.com","username":"fwiesel"},"change_message_id":"7fa72575e0cd2111d080c11e0dcebe63fbe50251","unresolved":true,"context_lines":[{"line_number":61,"context_line":"\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d"},{"line_number":62,"context_line":""},{"line_number":63,"context_line":"Instead of loading all the data up front, we delay the loading of the data"},{"line_number":64,"context_line":"when it is needed. This will be a configurable opt-in feature."},{"line_number":65,"context_line":""},{"line_number":66,"context_line":"- Extend test coverage for metadata to ensure functional equivalency"},{"line_number":67,"context_line":"  (See Testing)"}],"source_content_type":"text/x-rst","patch_set":9,"id":"fd5d5e4a_737f300c","line":64,"in_reply_to":"c6c0ba7d_ef0b296f","updated":"2024-07-19 09:13:53.000000000","message":"Well, there is a point of granularity where the overhead of the request will outweigh the benefit of splitting them up. But to get clear numbers on that, it would require implementing it, and my understanding is, this document should precede that.\n\nI can make the proposal \"less ambitious\", by delaying just the call for the security groups here:\nhttps://opendev.org/openstack/nova/src/branch/master/nova/api/metadata/base.py#L150-L151\n\nSo, in the most simple case, it is delaying two API calls to Neutron (neutron.list_ports, neutron.list_security_groups) until the data is actually being requested. \n\nI thought, I have to prove the value of the individual changes in the respective PRs anyway,\nand it might be worthwhile to look into if similar changes could reduce the load on Nova as well.","commit_id":"362bf84f5f0f6f477b91e3fc92b5c7431f1bb7d5"},{"author":{"_account_id":4393,"name":"Dan Smith","email":"dms@danplanet.com","username":"danms"},"change_message_id":"86ead97a471529c8853c7c0058a63f02ee4dc11b","unresolved":true,"context_lines":[{"line_number":98,"context_line":"has to be handled across services."},{"line_number":99,"context_line":""},{"line_number":100,"context_line":"The proposed change is limited to a single process, and also doesn\u0027t exclude"},{"line_number":101,"context_line":"cache invalidation."},{"line_number":102,"context_line":""},{"line_number":103,"context_line":""},{"line_number":104,"context_line":"Data model impact"}],"source_content_type":"text/x-rst","patch_set":9,"id":"f308dc70_03c43122","line":101,"updated":"2024-07-18 16:22:56.000000000","message":"Is this true? If we store the cached version in memcache or similar it should be shared, no?","commit_id":"362bf84f5f0f6f477b91e3fc92b5c7431f1bb7d5"},{"author":{"_account_id":24434,"name":"Fabian Wiesel","email":"fabian.wiesel@sap.com","username":"fwiesel"},"change_message_id":"7fa72575e0cd2111d080c11e0dcebe63fbe50251","unresolved":false,"context_lines":[{"line_number":98,"context_line":"has to be handled across services."},{"line_number":99,"context_line":""},{"line_number":100,"context_line":"The proposed change is limited to a single process, and also doesn\u0027t exclude"},{"line_number":101,"context_line":"cache invalidation."},{"line_number":102,"context_line":""},{"line_number":103,"context_line":""},{"line_number":104,"context_line":"Data model impact"}],"source_content_type":"text/x-rst","patch_set":9,"id":"2f4bb3ac_1cd5e8ce","line":101,"in_reply_to":"f308dc70_03c43122","updated":"2024-07-19 09:13:53.000000000","message":"Sorry, badly formulated. I mean a single service. I changed it to\n\n\u003e The proposed change is limited to the nova metadata service, and also doesn\u0027t\nexclude cache invalidation.","commit_id":"362bf84f5f0f6f477b91e3fc92b5c7431f1bb7d5"},{"author":{"_account_id":4393,"name":"Dan Smith","email":"dms@danplanet.com","username":"danms"},"change_message_id":"86ead97a471529c8853c7c0058a63f02ee4dc11b","unresolved":true,"context_lines":[{"line_number":126,"context_line":""},{"line_number":127,"context_line":"It should also reduce the way to cause a resource exhaustion attack via the"},{"line_number":128,"context_line":"metadata API, but it does not eliminate it. The metadata API needs to be"},{"line_number":129,"context_line":"protected by rate-limiting either way."},{"line_number":130,"context_line":""},{"line_number":131,"context_line":""},{"line_number":132,"context_line":"Notifications impact"}],"source_content_type":"text/x-rst","patch_set":9,"id":"1b100262_432edb6f","line":129,"updated":"2024-07-18 16:22:56.000000000","message":"Doesn\u0027t this open the possibility to increase the load by walking the tree every $cache_interval and generating a lot of smaller loads?","commit_id":"362bf84f5f0f6f477b91e3fc92b5c7431f1bb7d5"},{"author":{"_account_id":24434,"name":"Fabian Wiesel","email":"fabian.wiesel@sap.com","username":"fwiesel"},"change_message_id":"7fa72575e0cd2111d080c11e0dcebe63fbe50251","unresolved":true,"context_lines":[{"line_number":126,"context_line":""},{"line_number":127,"context_line":"It should also reduce the way to cause a resource exhaustion attack via the"},{"line_number":128,"context_line":"metadata API, but it does not eliminate it. The metadata API needs to be"},{"line_number":129,"context_line":"protected by rate-limiting either way."},{"line_number":130,"context_line":""},{"line_number":131,"context_line":""},{"line_number":132,"context_line":"Notifications impact"}],"source_content_type":"text/x-rst","patch_set":9,"id":"fdaaa5ba_4bfc637f","line":129,"in_reply_to":"1b100262_432edb6f","updated":"2024-07-19 09:13:53.000000000","message":"Well, \"lots\" depends on the granularity (See #64).\n\nUnless there strong evidence for doing it differently in the most simple form \"walking the tree\" should amount to the same number of requests, just spread differently.\n\nI suspect you got triggered by lazy-loading columns, where then an oslo.db object loading becomes lots of sql requests, but that would be the pathological case, which I agree we should avoid, but there are many steps in between loading all data across services up front, and loading every column for each object individually.","commit_id":"362bf84f5f0f6f477b91e3fc92b5c7431f1bb7d5"},{"author":{"_account_id":4393,"name":"Dan Smith","email":"dms@danplanet.com","username":"danms"},"change_message_id":"86ead97a471529c8853c7c0058a63f02ee4dc11b","unresolved":true,"context_lines":[{"line_number":181,"context_line":""},{"line_number":182,"context_line":"The expected outcome of the optimization is a reduced load on the"},{"line_number":183,"context_line":"nova-metadata API service, and the neutron metadata service, if the feature"},{"line_number":184,"context_line":"is enabled."},{"line_number":185,"context_line":""},{"line_number":186,"context_line":"Other deployer impact"},{"line_number":187,"context_line":"---------------------"}],"source_content_type":"text/x-rst","patch_set":9,"id":"a57818c7_84956b46","line":184,"updated":"2024-07-18 16:22:56.000000000","message":"I really don\u0027t see how this is possible. Requests are expensive, be them to the database, mq, or another service. Further, being able to pull part of the metadata and not others because we\u0027re generating so many requests seems less ideal to me.","commit_id":"362bf84f5f0f6f477b91e3fc92b5c7431f1bb7d5"},{"author":{"_account_id":24434,"name":"Fabian Wiesel","email":"fabian.wiesel@sap.com","username":"fwiesel"},"change_message_id":"7fa72575e0cd2111d080c11e0dcebe63fbe50251","unresolved":true,"context_lines":[{"line_number":181,"context_line":""},{"line_number":182,"context_line":"The expected outcome of the optimization is a reduced load on the"},{"line_number":183,"context_line":"nova-metadata API service, and the neutron metadata service, if the feature"},{"line_number":184,"context_line":"is enabled."},{"line_number":185,"context_line":""},{"line_number":186,"context_line":"Other deployer impact"},{"line_number":187,"context_line":"---------------------"}],"source_content_type":"text/x-rst","patch_set":9,"id":"4e3448d9_0ad47b38","line":184,"in_reply_to":"a57818c7_84956b46","updated":"2024-07-19 09:13:53.000000000","message":"\u003e Requests are expensive, be them to the database, mq, or another service. \n\nBy not doing them, when they are not needed.\n\nCurrently all the data is loaded (and stored in the cache) regardless if the client requested them (or ever will).\nAnd it all has to happen now in the first request.\n\nDelaying the requests will also spread them more, so the first request will not take the sum of all requests to all other service.\n\nYes, Cloud-init will walk the tree, so it will require all data, so there won\u0027t be much of a change there.\nBut Cirros doesn\u0027t, it doesn\u0027t request the security-groups, and we would cut out two calls to neutron from the metadata on each boot.\nSome of our users apparently also query only certain paths, and that is where the load comes for us.","commit_id":"362bf84f5f0f6f477b91e3fc92b5c7431f1bb7d5"}]}
