)]}' {"/PATCHSET_LEVEL":[{"author":{"_account_id":17669,"name":"Doug Szumski","email":"doug@stackhpc.com","username":"DougSzumski"},"change_message_id":"ad03c5964a3c0e8698bd5e417be92f358a1b2381","unresolved":false,"context_lines":[],"source_content_type":"","patch_set":1,"id":"fdfb61ce_f8d6c777","updated":"2024-12-02 12:11:29.000000000","message":"One of the motivations for adding this is to monitor the health status of the Kolla containers. However if I simulate an unhealthy container, it doesn\u0027t appear to show up in the metrics. Any idea what is going on there? \n\n```\n(kayobe) ubuntu@doug-aio:/$ sudo docker ps | grep unheal\nc8099ab07c0a quay.io/openstack.kolla/nova-compute:master-ubuntu-jammy \"dumb-init --single-…\" 4 months ago Up 11 minutes (unhealthy) nova_compute\n(kayobe) ubuntu@doug-aio:/$ curl 127.0.0.1:9323/metrics |grep heal\n % Total % Received % Xferd Average Speed Time Time Time Current\n Dload Upload Total Spent Left Speed\n 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0\n# HELP engine_daemon_health_check_start_duration_seconds The number of seconds it takes to prepare to run health checks\n# TYPE engine_daemon_health_check_start_duration_seconds histogram\nengine_daemon_health_check_start_duration_seconds_bucket{le\u003d\"0.005\"} 0\nengine_daemon_health_check_start_duration_seconds_bucket{le\u003d\"0.01\"} 0\nengine_daemon_health_check_start_duration_seconds_bucket{le\u003d\"0.025\"} 0\nengine_daemon_health_check_start_duration_seconds_bucket{le\u003d\"0.05\"} 454\nengine_daemon_health_check_start_duration_seconds_bucket{le\u003d\"0.1\"} 1967\nengine_daemon_health_check_start_duration_seconds_bucket{le\u003d\"0.25\"} 2078\nengine_daemon_health_check_start_duration_seconds_bucket{le\u003d\"0.5\"} 2098\nengine_daemon_health_check_start_duration_seconds_bucket{le\u003d\"1\"} 2098\nengine_daemon_health_check_start_duration_seconds_bucket{le\u003d\"2.5\"} 2098\nengine_daemon_health_check_start_duration_seconds_bucket{le\u003d\"5\"} 2098\nengine_daemon_health_check_start_duration_seconds_bucket{le\u003d\"10\"} 2098\nengine_daemon_health_check_start_duration_seconds_bucket{le\u003d\"+Inf\"} 2098\nengine_daemon_health_check_start_duration_seconds_sum 148.4189253720002\nengine_daemon_health_check_start_duration_seconds_count 2098\n# HELP engine_daemon_health_checks_failed_total The total number of failed health checks\n# TYPE engine_daemon_health_checks_failed_total counter\nengine_daemon_health_checks_failed_total 0\n# HELP engine_daemon_health_checks_total The total number of health checks\n# TYPE engine_daemon_health_checks_total counter\nengine_daemon_health_checks_total 2098\n```","commit_id":"734ebed38efba5ca21796a7e3d6872ad599133ba"},{"author":{"_account_id":17669,"name":"Doug Szumski","email":"doug@stackhpc.com","username":"DougSzumski"},"change_message_id":"f2b0ce5f52b076b8801c23e3191f452b02f5200f","unresolved":false,"context_lines":[],"source_content_type":"","patch_set":1,"id":"b647cccf_6de5e627","updated":"2024-12-02 11:49:07.000000000","message":"Thanks Alex, I think it looks good.\n\nPlease could you enable it in CI, specifically for the `prometheus-opensearch` scenario in tests/templates/globals-default.j2.\n\nIt would be even better if we could extend the test to get some health check metrics. That could be for a separate change as it needs doing in general.","commit_id":"734ebed38efba5ca21796a7e3d6872ad599133ba"},{"author":{"_account_id":14200,"name":"Maksim Malchuk","email":"maksim.malchuk@gmail.com","username":"mmalchuk"},"change_message_id":"0b436af64b9539448a15cf74a9f32c4bcd9ed9e6","unresolved":false,"context_lines":[],"source_content_type":"","patch_set":1,"id":"3e753837_b1ae8f82","updated":"2024-11-30 17:40:42.000000000","message":"recheck depends-on CI passed","commit_id":"734ebed38efba5ca21796a7e3d6872ad599133ba"},{"author":{"_account_id":15197,"name":"Pierre Riteau","email":"pierre@stackhpc.com","username":"priteau","status":"StackHPC"},"change_message_id":"de86987ee469961260e81bc1217baedd1ae54e21","unresolved":false,"context_lines":[],"source_content_type":"","patch_set":1,"id":"6f28ca36_451dde6a","in_reply_to":"796cc877_3a58f21e","updated":"2026-03-17 20:49:31.000000000","message":"After some testing, I discovered that the `engine_daemon_health_checks_failed_total` metric is reporting the number of health checks that have **failed to run** (for example due to resource constraints), not that have run but produced a non-zero exit code.\n\nStill, I think there are some other interesting metrics in this exporter which we could use.","commit_id":"734ebed38efba5ca21796a7e3d6872ad599133ba"},{"author":{"_account_id":35264,"name":"Alex Welsh","email":"alex@stackhpc.com","username":"alex-welsh"},"change_message_id":"913dd0d088c152a2f1d69fab4aab5d800718bc42","unresolved":false,"context_lines":[],"source_content_type":"","patch_set":1,"id":"796cc877_3a58f21e","in_reply_to":"fdfb61ce_f8d6c777","updated":"2024-12-02 16:08:32.000000000","message":"I can replicate the issue, so I\u0027m not sure if it\u0027s a bug in the exporter or I\u0027ve misunderstood what it does. Might need to go back to the drawing board with this one","commit_id":"734ebed38efba5ca21796a7e3d6872ad599133ba"},{"author":{"_account_id":17669,"name":"Doug Szumski","email":"doug@stackhpc.com","username":"DougSzumski"},"change_message_id":"d0060616c96a1b4a89a402088c15c6ff984e0053","unresolved":false,"context_lines":[],"source_content_type":"","patch_set":3,"id":"44719219_1b452671","updated":"2026-03-20 10:28:24.000000000","message":"Thanks for digging into it Pierre. \n\nThe linter is unhappy, but looks ok otherwise:\n\n```\nansible/roles/prometheus/templates/prometheus.yml.j2\n├── ansible/roles/prometheus/templates/prometheus.yml.j2:173 Bad Indentation, \n│ expected 5, got 1 (jinja-statements-indentation)\n├── ansible/roles/prometheus/templates/prometheus.yml.j2:176 Bad Indentation, \n│ expected 9, got 1 (jinja-statements-indentation)\n├── ansible/roles/prometheus/templates/prometheus.yml.j2:179 Bad Indentation, \n│ expected 9, got 1 (jinja-statements-indentation)\n└── ansible/roles/prometheus/templates/prometheus.yml.j2:180 Bad Indentation, \n expected 5, got 1 (jinja-statements-indentation)\n```","commit_id":"a328348ec39a2db6a661ac83b7ae1d368c17c805"},{"author":{"_account_id":17669,"name":"Doug Szumski","email":"doug@stackhpc.com","username":"DougSzumski"},"change_message_id":"9d8530edd32e397c57c64d9cdce9289afc8edf45","unresolved":false,"context_lines":[],"source_content_type":"","patch_set":5,"id":"7ba030fb_3f922e78","updated":"2026-04-16 15:11:23.000000000","message":"Rebased to fix conflicts, waiting for CI","commit_id":"7512f7b93d6ef53395528eb4bd05b3592e32de72"}],"ansible/group_vars/all/prometheus.yml":[{"author":{"_account_id":17669,"name":"Doug Szumski","email":"doug@stackhpc.com","username":"DougSzumski"},"change_message_id":"d0060616c96a1b4a89a402088c15c6ff984e0053","unresolved":true,"context_lines":[{"line_number":23,"context_line":"enable_prometheus_etcd_integration: \"{{ enable_prometheus | bool and enable_etcd | bool }}\""},{"line_number":24,"context_line":"enable_prometheus_proxysql_exporter: \"{{ enable_prometheus | bool and enable_proxysql | bool }}\""},{"line_number":25,"context_line":"# NOTE(Alex-Welsh): the Docker Prometheus endpoint is currently in development"},{"line_number":26,"context_line":"# and subject to change"},{"line_number":27,"context_line":"enable_prometheus_docker_metrics: \"no\""},{"line_number":28,"context_line":""},{"line_number":29,"context_line":"prometheus_alertmanager_user: \"admin\""}],"source_content_type":"text/x-yaml","patch_set":3,"id":"c372e494_31190640","line":26,"updated":"2026-03-20 10:28:24.000000000","message":"It\u0027s not encouraging that this is still the case!","commit_id":"a328348ec39a2db6a661ac83b7ae1d368c17c805"}]}