)]}'
{"doc/source/reference/developer/specs/tenant-resource-quota.rst":[{"author":{"_account_id":1,"name":"James E. Blair","email":"jim@acmegating.com","username":"corvus"},"change_message_id":"b3e2dc3082120aaf57cbec4b6b8cdc5dd8d5bcda","unresolved":false,"context_lines":[{"line_number":68,"context_line":""},{"line_number":69,"context_line":"   .. code-block:: yaml"},{"line_number":70,"context_line":""},{"line_number":71,"context_line":"      tenant-resources:"},{"line_number":72,"context_line":"        tenant1:"},{"line_number":73,"context_line":"          max-servers: 10"},{"line_number":74,"context_line":"          max-cores: 200"}],"source_content_type":"text/x-rst","patch_set":2,"id":"6962d56c_b34a51bf","line":71,"updated":"2021-05-10 15:14:19.000000000","message":"Should we call this \"tenant-resource-limits\"?","commit_id":"dbb7f305161b1f8cdba29eb618a570ad9eac04ce"},{"author":{"_account_id":1,"name":"James E. Blair","email":"jim@acmegating.com","username":"corvus"},"change_message_id":"b3e2dc3082120aaf57cbec4b6b8cdc5dd8d5bcda","unresolved":false,"context_lines":[{"line_number":93,"context_line":"   - if quota for current tenant would not be exceeded"},{"line_number":94,"context_line":"   "},{"line_number":95,"context_line":"     - proceed with normal process if tenant quota is not exceeded"},{"line_number":96,"context_line":""},{"line_number":97,"context_line":"3. for each node request that does not have the tenant attribute or a tenant"},{"line_number":98,"context_line":"   for which no ``tenant-resources`` config exists"},{"line_number":99,"context_line":""}],"source_content_type":"text/x-rst","patch_set":2,"id":"a74922c6_2e9d8c05","line":96,"updated":"2021-05-10 15:14:19.000000000","message":"I *think* that the intent is for the tenant-resources limit to be global across all providers, but it would be good to specify that.  The reason I think that is that you cited the use case of a number of openstack clouds, and, if we use the opendev system as an example, it doesn\u0027t make a lot of sense to me to have tenant quotas for each provider in the general case.\n\nHowever, it does make sense to me that we might want to give, say, airship extended use of special nodes.  We can already limit labels to tenants, so we can set up an all-or-none control.  But is there a use case for saying \"this tenant can have 200 cores from this provider, and another tenant can have 400 cores from the same provider\"?\n\nThat sounds complicated to describe and I sure don\u0027t want to set up a system that way.  So maybe if we don\u0027t have an immediate use case for that, we should just implement this as a global limit.  And if it comes up later, I think we could add it in as a provider-level config.\n\n-1: Assuming this all sounds good, let\u0027s state somewhere in here explicitly that the tenant-resources limit is global across all providers.","commit_id":"dbb7f305161b1f8cdba29eb618a570ad9eac04ce"},{"author":{"_account_id":1,"name":"James E. Blair","email":"jim@acmegating.com","username":"corvus"},"change_message_id":"b3e2dc3082120aaf57cbec4b6b8cdc5dd8d5bcda","unresolved":false,"context_lines":[{"line_number":115,"context_line":"planned resources. Ideally, we can extend this method to also return the"},{"line_number":116,"context_line":"resources currently allocated by each tenant without additional costs and"},{"line_number":117,"context_line":"account for this additional quota information as we already do for provider and"},{"line_number":118,"context_line":"pool quotas (cf. `SimpleTaskManagerHandler`_)"},{"line_number":119,"context_line":""},{"line_number":120,"context_line":""},{"line_number":121,"context_line":".. _`Kubernetes Driver Doc`: https://zuul-ci.org/docs/nodepool/kubernetes.html#attr-providers.[kubernetes].pools.labels.cpu"}],"source_content_type":"text/x-rst","patch_set":2,"id":"826cadb9_53e66769","line":118,"updated":"2021-05-10 15:14:19.000000000","message":"When I was trying to determine whether this made sense as a global or per-provider config, I considered that right now the launchers effectively only handle nodes from their own providers.  In other words, the QuotaSupport class only looks at nodes from that provider.\n\nHowever, it does *read* all of the nodes from ZK, so the additionaly load of calculating a global quota won\u0027t be significant.  So that\u0027s fine.\n\nHowever (again), it establishes quota usage by calling quotaNeededByLabel, which can have side effects of calling provider methods.  That could mean one provider calling another provider\u0027s methods, and that won\u0027t work (it won\u0027t have started).  So we should fully adopt the method of putting the resource usage information in the ZK Node entries.  We should make sure all providers do that in all cases, and then remove the quotaNeededByLabel call in the QuotaSupport class, so it (and the new method) will rely only on data in ZK.","commit_id":"dbb7f305161b1f8cdba29eb618a570ad9eac04ce"},{"author":{"_account_id":29739,"name":"Thomas Zink","email":"thomas.tz.zink@bmw.de"},"change_message_id":"ab13829bd439333793dc5d58485cd521a09d64e6","unresolved":true,"context_lines":[{"line_number":34,"context_line":"thread."},{"line_number":35,"context_line":""},{"line_number":36,"context_line":"Also, in scenarios where Zuul and auxiliary services (e.g. GitHub or"},{"line_number":37,"context_line":"Aritfactory) are operated near or at their limits, the system can become"},{"line_number":38,"context_line":"unstable. In such a situation, a common measure is to lower Nodepools resource"},{"line_number":39,"context_line":"quota to limit the number of concurrent builds and thereby reduce the load on"},{"line_number":40,"context_line":"Zuul and other involved services. However, this can currently be done only on"}],"source_content_type":"text/x-rst","patch_set":3,"id":"22216b2e_b9127e37","line":37,"range":{"start_line":37,"start_character":0,"end_line":37,"end_character":11},"updated":"2021-05-25 16:01:28.000000000","message":"Artifactory","commit_id":"fbf63a5c756bcb6f1357ecaf43a9e41e4bff0209"},{"author":{"_account_id":29739,"name":"Thomas Zink","email":"thomas.tz.zink@bmw.de"},"change_message_id":"ef3e87e7a57105f90d1e48edf3e4a162032f94e1","unresolved":false,"context_lines":[{"line_number":34,"context_line":"thread."},{"line_number":35,"context_line":""},{"line_number":36,"context_line":"Also, in scenarios where Zuul and auxiliary services (e.g. GitHub or"},{"line_number":37,"context_line":"Aritfactory) are operated near or at their limits, the system can become"},{"line_number":38,"context_line":"unstable. In such a situation, a common measure is to lower Nodepools resource"},{"line_number":39,"context_line":"quota to limit the number of concurrent builds and thereby reduce the load on"},{"line_number":40,"context_line":"Zuul and other involved services. However, this can currently be done only on"}],"source_content_type":"text/x-rst","patch_set":3,"id":"239a92c1_e5937f5b","line":37,"range":{"start_line":37,"start_character":0,"end_line":37,"end_character":11},"in_reply_to":"22216b2e_b9127e37","updated":"2021-05-25 16:08:24.000000000","message":"Done","commit_id":"fbf63a5c756bcb6f1357ecaf43a9e41e4bff0209"},{"author":{"_account_id":29739,"name":"Thomas Zink","email":"thomas.tz.zink@bmw.de"},"change_message_id":"ab13829bd439333793dc5d58485cd521a09d64e6","unresolved":true,"context_lines":[{"line_number":57,"context_line":"\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d"},{"line_number":58,"context_line":""},{"line_number":59,"context_line":"The proposed change consists of several parts in both, Zuul and Nodepool. As"},{"line_number":60,"context_line":"Zuul is the only source of through for tenants, it must pass the name of the"},{"line_number":61,"context_line":"tenant with each NodeRequest to Nodepool. The Nodepool side must consider this"},{"line_number":62,"context_line":"information and adhere to any resource limits configured for the corresponding"},{"line_number":63,"context_line":"tenant. However, this shall be backwards compatible, i.e., if no tenant name is"}],"source_content_type":"text/x-rst","patch_set":3,"id":"20012477_ee379a6e","line":60,"range":{"start_line":60,"start_character":27,"end_line":60,"end_character":34},"updated":"2021-05-25 16:01:28.000000000","message":"truth","commit_id":"fbf63a5c756bcb6f1357ecaf43a9e41e4bff0209"},{"author":{"_account_id":29739,"name":"Thomas Zink","email":"thomas.tz.zink@bmw.de"},"change_message_id":"ef3e87e7a57105f90d1e48edf3e4a162032f94e1","unresolved":false,"context_lines":[{"line_number":57,"context_line":"\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d"},{"line_number":58,"context_line":""},{"line_number":59,"context_line":"The proposed change consists of several parts in both, Zuul and Nodepool. As"},{"line_number":60,"context_line":"Zuul is the only source of through for tenants, it must pass the name of the"},{"line_number":61,"context_line":"tenant with each NodeRequest to Nodepool. The Nodepool side must consider this"},{"line_number":62,"context_line":"information and adhere to any resource limits configured for the corresponding"},{"line_number":63,"context_line":"tenant. However, this shall be backwards compatible, i.e., if no tenant name is"}],"source_content_type":"text/x-rst","patch_set":3,"id":"8b1e7a6e_712ae275","line":60,"range":{"start_line":60,"start_character":27,"end_line":60,"end_character":34},"in_reply_to":"20012477_ee379a6e","updated":"2021-05-25 16:08:24.000000000","message":"Done","commit_id":"fbf63a5c756bcb6f1357ecaf43a9e41e4bff0209"},{"author":{"_account_id":29739,"name":"Thomas Zink","email":"thomas.tz.zink@bmw.de"},"change_message_id":"ab13829bd439333793dc5d58485cd521a09d64e6","unresolved":true,"context_lines":[{"line_number":112,"context_line":""},{"line_number":113,"context_line":"   - if quota for current tenant would not be exceeded"},{"line_number":114,"context_line":""},{"line_number":115,"context_line":"     - proceed with normal process if tenant quota is not exceeded"},{"line_number":116,"context_line":""},{"line_number":117,"context_line":"3. for each node request that does not have the tenant attribute or a tenant"},{"line_number":118,"context_line":"   for which no ``tenant-resource-limits`` config exists"}],"source_content_type":"text/x-rst","patch_set":3,"id":"503358c4_8209c29a","line":115,"range":{"start_line":115,"start_character":34,"end_line":115,"end_character":66},"updated":"2021-05-25 16:01:28.000000000","message":"nit: could be omitted","commit_id":"fbf63a5c756bcb6f1357ecaf43a9e41e4bff0209"},{"author":{"_account_id":29739,"name":"Thomas Zink","email":"thomas.tz.zink@bmw.de"},"change_message_id":"ef3e87e7a57105f90d1e48edf3e4a162032f94e1","unresolved":false,"context_lines":[{"line_number":112,"context_line":""},{"line_number":113,"context_line":"   - if quota for current tenant would not be exceeded"},{"line_number":114,"context_line":""},{"line_number":115,"context_line":"     - proceed with normal process if tenant quota is not exceeded"},{"line_number":116,"context_line":""},{"line_number":117,"context_line":"3. for each node request that does not have the tenant attribute or a tenant"},{"line_number":118,"context_line":"   for which no ``tenant-resource-limits`` config exists"}],"source_content_type":"text/x-rst","patch_set":3,"id":"798fb4d7_7fe6ae84","line":115,"range":{"start_line":115,"start_character":34,"end_line":115,"end_character":66},"in_reply_to":"503358c4_8209c29a","updated":"2021-05-25 16:08:24.000000000","message":"Done","commit_id":"fbf63a5c756bcb6f1357ecaf43a9e41e4bff0209"},{"author":{"_account_id":1,"name":"James E. Blair","email":"jim@acmegating.com","username":"corvus"},"change_message_id":"eb0ef67d628c39a282eca589c96f42999621a19c","unresolved":false,"context_lines":[{"line_number":77,"context_line":"1. Add \"tenant\" attribute to zk.NodeRequest (applies to Zuul and Nodepool)"},{"line_number":78,"context_line":"2. Add \"tenant\" attribute to zk.Node (applies to Nodepool)"},{"line_number":79,"context_line":"3. Add \"num_cores\" attribute to zk.Node (applies to Nodepool)"},{"line_number":80,"context_line":"4. Add \"num_ram\" attribute to zk.Node (applies to Nodepool)"},{"line_number":81,"context_line":""},{"line_number":82,"context_line":"Introduce Tenant Quotas in Nodepool"},{"line_number":83,"context_line":"-----------------------------------"}],"source_content_type":"text/x-rst","patch_set":5,"id":"28373742_50440742","line":80,"updated":"2021-05-26 00:04:23.000000000","message":"There\u0027s already a zk.Node.resources dictionary, which the OpenStack driver fills in with a dictionary with keys \u0027cores\u0027, \u0027instances\u0027, \u0027ram\u0027.  I think we shound standardize all drivers to use that.\n\nI would suggest you either make this more vague (\"Add resource information to zk.Node in a standardized way in all drivers\") or update it to say \"Standardize all drivers to supply a resources dictionary to zk.Node with keys: cores, instances, ram (as the OpenStack driver currently does)\").\n\nThat will improve Zuul\u0027s resource reporting as a side effect since it already reads that.","commit_id":"2868257ac2f80babc005ed4242db2ecf01538503"},{"author":{"_account_id":1,"name":"James E. Blair","email":"jim@acmegating.com","username":"corvus"},"change_message_id":"eb0ef67d628c39a282eca589c96f42999621a19c","unresolved":false,"context_lines":[{"line_number":132,"context_line":"account the number of CPU cores and memory allocated for a node. Only the"},{"line_number":133,"context_line":"number of servers are considered there. Therefore, nodes from a AWS or Azure"},{"line_number":134,"context_line":"provider cannot be fully taken into account when calculating a global resource"},{"line_number":135,"context_line":"limit besides of number of servers."},{"line_number":136,"context_line":""},{"line_number":137,"context_line":"In the `QuotaSupport`_ mixin class, we already query ZooKeeper for the used and"},{"line_number":138,"context_line":"planned resources. Ideally, we can extend this method to also return the"}],"source_content_type":"text/x-rst","patch_set":5,"id":"17ad4733_3d5400fb","line":135,"updated":"2021-05-26 00:04:23.000000000","message":"Well, that\u0027s about to stop being true for Azure at least, and should be easy to rectify for AWS.  That\u0027s why we standardizing on using the Node.resources dictionary.\n\nI might rephrase this to just say that since the AWS and Azure drivers currently (as of this writing) have incomplete quota support, the tenant limits will only be as good as the driver support for quota.","commit_id":"2868257ac2f80babc005ed4242db2ecf01538503"},{"author":{"_account_id":1,"name":"James E. Blair","email":"jim@acmegating.com","username":"corvus"},"change_message_id":"eb0ef67d628c39a282eca589c96f42999621a19c","unresolved":false,"context_lines":[{"line_number":141,"context_line":"pool quotas (cf. `SimpleTaskManagerHandler`_). However, calculation of"},{"line_number":142,"context_line":"currently consumed resources by a provider is done only for nodes of the same"},{"line_number":143,"context_line":"provider. This does not easily work for global limits as intended for tenant"},{"line_number":144,"context_line":"quotas. Therefore, this information (``num_cores``, ``num_ram``) will be stored"},{"line_number":145,"context_line":"in a generic way on ``zk.Node`` objects for any provider to evaluate these"},{"line_number":146,"context_line":"quotas upon an incoming node request."},{"line_number":147,"context_line":""}],"source_content_type":"text/x-rst","patch_set":5,"id":"da3aeabd_9417ad17","line":144,"updated":"2021-05-26 00:04:23.000000000","message":"Okay this sentence \"global limits as intended for tenant quotas\" seems to address my comment on line 96 of ps 2.","commit_id":"2868257ac2f80babc005ed4242db2ecf01538503"},{"author":{"_account_id":1,"name":"James E. Blair","email":"jim@acmegating.com","username":"corvus"},"change_message_id":"a302181da398f3aa354fd7f23bd7f8964f709787","unresolved":false,"context_lines":[{"line_number":89,"context_line":""},{"line_number":90,"context_line":"   .. code-block:: yaml"},{"line_number":91,"context_line":""},{"line_number":92,"context_line":"      tenant-resource-limits:"},{"line_number":93,"context_line":"        tenant1:"},{"line_number":94,"context_line":"          max-servers: 10"},{"line_number":95,"context_line":"          max-cores: 200"}],"source_content_type":"text/x-rst","patch_set":6,"id":"a681aa96_c2a386dd","line":92,"updated":"2021-06-07 20:53:14.000000000","message":"Good point; I was assuming it would be like min-ready, which is sort of \"caveat operator\".  The values would be used independently by each launcher.  I don\u0027t think this has the same race condition properties of min-ready that lead us to pick a \"primary\" launcher.  Instead, they should be kept the same on all of the launchers.  Technically they could be different though.  If they are different, I don\u0027t think it would cause a problem, just be confusing why some launchers could apparently exceed the capacity while others never get to launch anything.\n\nI think in the long run we might want to move to a single global state in ZK, but we need to work out the lifecycle for that.  I think the idea that they should be the same everywhere is workable enough (and matches the current design) that it doesn\u0027t need to be a pre-requisite for this.","commit_id":"c63867774273cd0e45df653fbbaa83ed9b94b064"},{"author":{"_account_id":4146,"name":"Clark Boylan","email":"cboylan@sapwetik.org","username":"cboylan"},"change_message_id":"d80130c0aba01576019e125f53b567e389d5ae42","unresolved":true,"context_lines":[{"line_number":89,"context_line":""},{"line_number":90,"context_line":"   .. code-block:: yaml"},{"line_number":91,"context_line":""},{"line_number":92,"context_line":"      tenant-resource-limits:"},{"line_number":93,"context_line":"        tenant1:"},{"line_number":94,"context_line":"          max-servers: 10"},{"line_number":95,"context_line":"          max-cores: 200"}],"source_content_type":"text/x-rst","patch_set":6,"id":"92dc92c0_c87b21a3","line":92,"updated":"2021-06-07 20:39:04.000000000","message":"It might be worth writing down how this global config should be handled in a distributed system. Currently OpenDev runs 4 nodepool launchers and 3 builders. Each of the launchers runs a different config. Would we be required to pick a primary launcher and define the limits there? Or would we need to set the same limits across the board? Maybe nodepool would take the most conservative config?\n\nI think we have options here. I think it would be good to think about what we want to have though.","commit_id":"c63867774273cd0e45df653fbbaa83ed9b94b064"},{"author":{"_account_id":1,"name":"James E. Blair","email":"jim@acmegating.com","username":"corvus"},"change_message_id":"a302181da398f3aa354fd7f23bd7f8964f709787","unresolved":false,"context_lines":[{"line_number":109,"context_line":"     - do not pause the pool (as opposed to exceeded pool quota)"},{"line_number":110,"context_line":"     - leave the node request unfulfilled (REQUESTED state)"},{"line_number":111,"context_line":"     - return from handler for another iteration to fulfill request when tenant"},{"line_number":112,"context_line":"       quota allows eventually"},{"line_number":113,"context_line":""},{"line_number":114,"context_line":"   - if quota for current tenant would not be exceeded"},{"line_number":115,"context_line":""}],"source_content_type":"text/x-rst","patch_set":6,"id":"f9a5e57e_601799b1","line":112,"updated":"2021-06-07 20:53:14.000000000","message":"It will effectively be a high priority since it\u0027s the head of the queue at this point; the next launcher to look at the queue (which could be this one), will see it first (or at least, after all the other similar requests it skipped).","commit_id":"c63867774273cd0e45df653fbbaa83ed9b94b064"},{"author":{"_account_id":4146,"name":"Clark Boylan","email":"cboylan@sapwetik.org","username":"cboylan"},"change_message_id":"d80130c0aba01576019e125f53b567e389d5ae42","unresolved":true,"context_lines":[{"line_number":109,"context_line":"     - do not pause the pool (as opposed to exceeded pool quota)"},{"line_number":110,"context_line":"     - leave the node request unfulfilled (REQUESTED state)"},{"line_number":111,"context_line":"     - return from handler for another iteration to fulfill request when tenant"},{"line_number":112,"context_line":"       quota allows eventually"},{"line_number":113,"context_line":""},{"line_number":114,"context_line":"   - if quota for current tenant would not be exceeded"},{"line_number":115,"context_line":""}],"source_content_type":"text/x-rst","patch_set":6,"id":"38753612_e48a94d1","line":112,"updated":"2021-06-07 20:39:04.000000000","message":"We might want to prioritize requests which were previously passed over due to tenant limits to avoid starving tenants when near our global resource limits.","commit_id":"c63867774273cd0e45df653fbbaa83ed9b94b064"},{"author":{"_account_id":4146,"name":"Clark Boylan","email":"cboylan@sapwetik.org","username":"cboylan"},"change_message_id":"d80130c0aba01576019e125f53b567e389d5ae42","unresolved":true,"context_lines":[{"line_number":142,"context_line":"set the corresponding ``zk.Node.resources`` attributes. As for now, only the"},{"line_number":143,"context_line":"OpenStack driver exports resource information about its nodes to ZooKeeper, but"},{"line_number":144,"context_line":"as other drivers get enhanced with this feature, they will inherently be"},{"line_number":145,"context_line":"considered for such global limits as well."},{"line_number":146,"context_line":""},{"line_number":147,"context_line":"In the `QuotaSupport`_ mixin class, we already query ZooKeeper for the used and"},{"line_number":148,"context_line":"planned resources. Ideally, we can extend this method to also return the"}],"source_content_type":"text/x-rst","patch_set":6,"id":"b086d4de_22d42ae3","line":145,"updated":"2021-06-07 20:39:04.000000000","message":"Could we do a simple max-servers limit for kubernetes and other drivers?","commit_id":"c63867774273cd0e45df653fbbaa83ed9b94b064"},{"author":{"_account_id":1,"name":"James E. Blair","email":"jim@acmegating.com","username":"corvus"},"change_message_id":"a302181da398f3aa354fd7f23bd7f8964f709787","unresolved":false,"context_lines":[{"line_number":142,"context_line":"set the corresponding ``zk.Node.resources`` attributes. As for now, only the"},{"line_number":143,"context_line":"OpenStack driver exports resource information about its nodes to ZooKeeper, but"},{"line_number":144,"context_line":"as other drivers get enhanced with this feature, they will inherently be"},{"line_number":145,"context_line":"considered for such global limits as well."},{"line_number":146,"context_line":""},{"line_number":147,"context_line":"In the `QuotaSupport`_ mixin class, we already query ZooKeeper for the used and"},{"line_number":148,"context_line":"planned resources. Ideally, we can extend this method to also return the"}],"source_content_type":"text/x-rst","patch_set":6,"id":"5f48028c_c5e39a30","line":145,"updated":"2021-06-07 20:53:14.000000000","message":"We could -- I feel like there are equally legitimate use cases for counting a k8s pod as a \"server\" and not doing so.  Either way, we pick a side.","commit_id":"c63867774273cd0e45df653fbbaa83ed9b94b064"}]}
