)]}'
{"/PATCHSET_LEVEL":[{"author":{"_account_id":36080,"name":"Erkin Mussurmankulov","display_name":"Eric","email":"mangust404@gmail.com","username":"mongoose404","status":"PS Cloud services employee"},"change_message_id":"3948e070addd079cf38b75b5c147c88625152723","unresolved":false,"context_lines":[],"source_content_type":"","patch_set":11,"id":"56d87617_4888134c","updated":"2026-05-30 20:39:13.000000000","message":"db-backup-postgresql:18 reduced from 371MB to 270MB\ndb-backup-mysql:8.4 reduced from 525MB to 364MB\ndb-backup-mariadb:11.4 reduced from 356MB to 252MB\n\nI think that\u0027s a limit.\n\nWe can achieve further optimization if we\u0027ll get rid of backup images completely and implement streaming backups using database images instead.\n\nI wanted to make this patch small and clever, but it turned out that way 😰","commit_id":"6f25aa679cd5ea3bb9bdc77661cd9f445fe0294a"},{"author":{"_account_id":36080,"name":"Erkin Mussurmankulov","display_name":"Eric","email":"mangust404@gmail.com","username":"mongoose404","status":"PS Cloud services employee"},"change_message_id":"405aee7168e31f16c80f3c5fcf8631d3520aa710","unresolved":true,"context_lines":[],"source_content_type":"","patch_set":96,"id":"8ba88a22_dc343b4e","updated":"2026-06-04 13:31:35.000000000","message":"Hello, Wu!\n\nDuring the investigation, I found the following:\n\n* The Zuul job runs on nodes with `32 GB RAM` and `8 CPU` cores. After all DevStack services are started, approximately `23 GB RAM` and `8 CPU` cores remain available for allocation.\n* Our default instance flavor is `d3`, which provides `2 GB RAM` and `2 vCPUs`. As a result, a significant amount of RAM remains unused.\n* The DevStack and Trove installation process takes approximately 25-30 minutes. This is the baseline duration to which the execution time of individual test scenarios is added.\n* Zuul job completion tasks (log uploads, etc.) take approximately 3 minutes and cannot be controlled from the Trove repository.\n* For test scenarios, the most time-consuming phase is instance creation (as expected), and the impact is multiplied by the number of instances. For example, an additional 2 minutes during the instance creation process for backup tests increases the scenario execution time by 2 * 3 \u003d 6 minutes, since three instances are created during the test.\n* The most time-consuming operation during instance creation is Docker image pulling and extraction.\n* The `trove-image-loader` service was running in parallel with the `guest-agent` service, which presumably slowed down Docker image pulling performed concurrently by the guest agent. I did not take measurements to quantify the impact, but it seemed unlikely that this behavior was achieving its intended purpose.\n* After modifying `trove-image-loader` to run before the guest agent, I was able to measure Docker image pull times accurately.\n* I also implemented an improvised caching mechanism: Docker images are saved to tarballs, embedded into the guest qcow2 image, and then loaded on guest startup by the `trove-image-loader` service script using `docker load` command.\n* Below are the timing results:\n\nFor instance without `TROVE_ENABLE_LOCAL_REGISTRY` (pulling from quay.io):\n```\ndocker load from cache:\n5m 57s db\n3m 54s db-backup\n\ndocker pulling from quay via internet:\n6m 30s db\n4m 21s db-backup\n```\n\nFor intance with `TROVE_ENABLE_LOCAL_REGISTRY` (pulling from local registry):\n```\ndocker load from cache:\n5m 46s db\n2m 53s db-backup\n\ndocker pulling from local registry:\n6m 35s db\n3m 01s db-backup\n```\n\n* Most CPU and I/O resources are consumed by containerd while unpacking image layers. A single image can produce more than 20k files during extraction. Therefore, I believe the bottleneck is the overall VM performance, including CPU, memory, and storage throughput.\n\n\nResults and conclusions:\n\n* DevStack already provides an asynchronous execution library. We can build the registry and guest images asynchronously, reducing DevStack setup time from approximately 26 minutes to approximately 13 minutes (a significant improvement on its own).\n\nHere is an example of the Async library output from a Zuul job before optimization:\n\n```\n   Async summary\n  \u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\n   Time spent in the background minus waits: 275 sec\n   Elapsed time: 1598 sec\n   Time if we did everything serially: 1873 sec\n   Speedup:  1.17209\n```\n\nAnd after optimization:\n\n```\n   Async summary\n  \u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\n   Time spent in the background minus waits: 543 sec\n   Elapsed time: 802 sec\n   Time if we did everything serially: 1345 sec\n   Speedup:  1.67706\n```\n\n* Tarball imports: `docker load` from a tarball is slightly faster than `docker pull` from a remote registry. We may keep this solution, as it provides a modest but measurable improvement.\n* Build and load only the required images: `trove-image-loader` currently fetches both `database` and `db-backup` images regardless of whether the `db-backup` image is required by the test. At the moment, this behavior is implicitly controlled by the `TROVE_ENABLE_LOCAL_REGISTRY` flag, which is ambiguous. We could improve this by introducing an explicit flag, `TROVE_DB_BACKUP_IMAGE_REQUIRED`.\n* Resource management: while CPU resources are limited, we have a substantial amount of available RAM. One idea is to use a RAM disk for `/var/lib/docker` inside database instances. This should be relatively straightforward to implement.\n* RAM caching configuration: to make the solution more flexible, we could introduce the following parameters in `devstack_localrc`:\n  * `TROVE_VCPUS` - control the flavor vCPU count.\n  * `TROVE_RAM_MB` - control the flavor RAM size.\n  * `TROVE_USE_DOCKER_CACHE` - enable or disable Docker caching.\n  * `TROVE_DOCKER_CACHE_SIZE` - control the RAM disk size for `/var/lib/docker`.\n* Docker image optimization: images should remain as small as possible so that they fit within the RAM disk. This applies both to the current images and to any future images.\n\nThe strategy for optimizing Docker images is:\n1. Reduce the number of layers (ideally to a single layer of each type; the `RUN` layer is the most important).\n2. Use smaller base images whenever possible (for example, alpine-based images).\n3. Remove unused packages, large binaries, manuals, documentation, and other unnecessary files.\n4. Always perform cleanup at the end of the RUN layer.\n\nMy image size optimization results are:\n* `db-backup-postgresql:18` reduced from `371 MB` to `270 MB` (unpacked size).\n* `db-backup-mysql:8.4` reduced from `525 MB` to `364 MB` (unpacked size).\n* `db-backup-mariadb:11.4` reduced from `356 MB` to `252 MB` (unpacked size).\n* `mysql:8.4` reduced from `227.2 MiB` to `92.9 MiB` (packed size).\n* `postgres:18` reduced from `154.8 MiB` to `23.3 MiB` (packed size).\nI was unable to significantly reduce the size of the MariaDB images (`99 MiB` packed size); there is very little left to remove, and the image is already well optimized.\n\n\nThis work required a considerable amount of time and patience, but the results are significant, especially considering that both the number and complexity of test scenarios are expected to grow over time.\n\nThis page is particularly useful:\nhttps://zuul.opendev.org/t/openstack/status\n\nThere is no need to wait for the entire job set to finish; log streaming is available in real time.\n\nP.S. Until these changes are merged into master, the `publish-trove-images-quay` periodic job will continue replacing the MySQL and PostgreSQL images with the original versions. As a result, they may no longer fit within a `1.5GB` RAM disk.","commit_id":"fc8775388c805f76e88d155a0f8242f44c105594"},{"author":{"_account_id":36080,"name":"Erkin Mussurmankulov","display_name":"Eric","email":"mangust404@gmail.com","username":"mongoose404","status":"PS Cloud services employee"},"change_message_id":"436952e52e9201253152a07b4892fbdd2441032c","unresolved":false,"context_lines":[],"source_content_type":"","patch_set":97,"id":"10ad5f65_c846f94c","updated":"2026-06-04 15:40:32.000000000","message":"Hello, Hirotaka!\nCan you also take a look at this patch, please.\n\nI did a lot of testing. Everything seems to work fine in both zuul and my local DevStack, but maybe there are still some errors in the code or in the design itself.","commit_id":"f96ba259257149197456324c3186a12c15c18788"}]}
