)]}'
{"/PATCHSET_LEVEL":[{"author":{"_account_id":14826,"name":"Mark Goddard","email":"markgoddard86@gmail.com","username":"mgoddard"},"change_message_id":"9d6591dd1ec62f7bffb1e072e316a4ef46219e5d","unresolved":false,"context_lines":[],"source_content_type":"","patch_set":1,"id":"8e6dbec1_88c4939d","updated":"2023-02-22 11:43:08.000000000","message":"I would suggest adding a new variable, neutron_l3_agent_graceful_timeout, with an increased default (5 minutes?), then using it for the l3 agent restart handler kolla_docker module and the oslo config option.","commit_id":"8926abd0d4cc7c18c24a27999332e08fbb8c68df"},{"author":{"_account_id":30911,"name":"Jan Horstmann","email":"horstmann@osism.tech","username":"jhorstmann"},"change_message_id":"1e29fcaab1b5f7225a184b14df63d7b0da331bb9","unresolved":false,"context_lines":[],"source_content_type":"","patch_set":1,"id":"0d4333c4_00e9e631","updated":"2023-02-22 12:30:58.000000000","message":"Seems that we have been working in parallel on the same thing. I have pushed my work now just for comparison: https://review.opendev.org/c/openstack/kolla-ansible/+/874769\nMaybe there is some useful stuff in there :)\n\nThere is quite a lot of information in the commit message which might be useful as well","commit_id":"8926abd0d4cc7c18c24a27999332e08fbb8c68df"},{"author":{"_account_id":35264,"name":"Alex Welsh","email":"alex@stackhpc.com","username":"alex-welsh"},"change_message_id":"c914e724c02c37e2a5269d1b4ca21f3dbc43c515","unresolved":false,"context_lines":[],"source_content_type":"","patch_set":1,"id":"66c570f9_69b51064","in_reply_to":"0d4333c4_00e9e631","updated":"2023-02-22 13:56:13.000000000","message":"Thanks for sharing your work Jan. I think the largest distinction between the two approaches is that I have not chosen to remove the wrapper and cleanup scripts.\n\nMy reasoning was that in the event the entire cleanup process is not completed (container crash, container manually stopped, timeout too short etc) the cleanup may still prove useful.\n\nI\u0027d be interested to hear your thoughts though, and any suggestions for other aspects you want to port over are more than welcome.","commit_id":"8926abd0d4cc7c18c24a27999332e08fbb8c68df"},{"author":{"_account_id":30911,"name":"Jan Horstmann","email":"horstmann@osism.tech","username":"jhorstmann"},"change_message_id":"6896771d531e0c3b66e77e4b492ae1e920941676","unresolved":false,"context_lines":[],"source_content_type":"","patch_set":1,"id":"8229876b_239cdd1d","in_reply_to":"66c570f9_69b51064","updated":"2023-02-22 16:33:56.000000000","message":"My main motivation was to remove the cleanup script, because it introduces a delay in router reconciliation after the container was started. I think this complicates restarting the agents sequentially, which should be done to avoid disrupting any connections.\nI agree that there should be some guard against unclean shutdowns. At some point I tried introducing a stop-start procedure with the cleanup script running in between, depending on the exit code of the container. I could not get that to work properly and without any loss of connectivity however, so I gave up on that for now.\nMaybe keeping the wrapper is actually the best option\n\nDuring my tests I stumbled upon some caveats, which might be relevant\n* With DVR, l3 agents on compute nodes in `neutron_compute_dvr_mode` should not have `cleanup_on_shutdown \u003d True`, because they are not HA and cleanup will thus lead to loss of connectivity for instances with floating IPs on the agent\u0027s node\n* With DVR, l3 agents on compute nodes may be restarted in parallel to save time for the same reason as above\n* I could not get the restart handler to run sequentially without moving the restart into its own task using an `include_tasks` statement in the handler. Have you verified that this actually works?\n* When some l3 agents were stopped, there might still be loss of connectivity while one agent is gracefully shutting down and the other one is still reconciling after being started. For this scenario `neutron_l3_agent_failover_delay` is still necessary\n\nI am really interested in having kolla-ansible deployments without any connection loss, so if I can be of any help (e.g. testing) just let me know.","commit_id":"8926abd0d4cc7c18c24a27999332e08fbb8c68df"},{"author":{"_account_id":30911,"name":"Jan Horstmann","email":"horstmann@osism.tech","username":"jhorstmann"},"change_message_id":"f8b5fb7d1cbd69798b5a39980bbea1b240950f7b","unresolved":false,"context_lines":[],"source_content_type":"","patch_set":1,"id":"9fbd943f_a996a2f5","in_reply_to":"8229876b_239cdd1d","updated":"2023-03-17 15:13:56.000000000","message":"I have uploaded another version of my patch which orchestrates l3 agent restarts by sequentially disabling, restarting and re-enabling them. This process is actually faster than using `cleanup_on_shutdown`.\nThe path also implements a criteria to define when it is safe to restart an l3 agent.\n\n\nhttps://review.opendev.org/c/openstack/kolla-ansible/+/874769","commit_id":"8926abd0d4cc7c18c24a27999332e08fbb8c68df"}],"releasenotes/notes/l3-agent-graceful-shutdown-83926053b463aceb.yaml":[{"author":{"_account_id":14826,"name":"Mark Goddard","email":"markgoddard86@gmail.com","username":"mgoddard"},"change_message_id":"8160a4e237bb0e97b82521bfd7d4f49a731de07b","unresolved":true,"context_lines":[{"line_number":1,"context_line":"---"},{"line_number":2,"context_line":"fixes:"},{"line_number":3,"context_line":"  - |"},{"line_number":4,"context_line":"    Fixes the l3 agent graceful stop procedure, allowing HA routers to be"},{"line_number":5,"context_line":"    properly removed using the cleanup_on_shutdown neutron config option."},{"line_number":6,"context_line":"    Previously this option had no effect. It is worth noting that the shutdown"},{"line_number":7,"context_line":"    procedure can take a long time. The timeout can be up to 10 seconds per"}],"source_content_type":"text/x-yaml","patch_set":1,"id":"d749c607_e705160d","line":4,"range":{"start_line":4,"start_character":14,"end_line":4,"end_character":16},"updated":"2023-02-22 11:44:30.000000000","message":"Neutron L3","commit_id":"8926abd0d4cc7c18c24a27999332e08fbb8c68df"},{"author":{"_account_id":35264,"name":"Alex Welsh","email":"alex@stackhpc.com","username":"alex-welsh"},"change_message_id":"c914e724c02c37e2a5269d1b4ca21f3dbc43c515","unresolved":false,"context_lines":[{"line_number":1,"context_line":"---"},{"line_number":2,"context_line":"fixes:"},{"line_number":3,"context_line":"  - |"},{"line_number":4,"context_line":"    Fixes the l3 agent graceful stop procedure, allowing HA routers to be"},{"line_number":5,"context_line":"    properly removed using the cleanup_on_shutdown neutron config option."},{"line_number":6,"context_line":"    Previously this option had no effect. It is worth noting that the shutdown"},{"line_number":7,"context_line":"    procedure can take a long time. The timeout can be up to 10 seconds per"}],"source_content_type":"text/x-yaml","patch_set":1,"id":"e1b76dde_19ea4937","line":4,"range":{"start_line":4,"start_character":14,"end_line":4,"end_character":16},"in_reply_to":"d749c607_e705160d","updated":"2023-02-22 13:56:13.000000000","message":"Done","commit_id":"8926abd0d4cc7c18c24a27999332e08fbb8c68df"},{"author":{"_account_id":14826,"name":"Mark Goddard","email":"markgoddard86@gmail.com","username":"mgoddard"},"change_message_id":"8160a4e237bb0e97b82521bfd7d4f49a731de07b","unresolved":true,"context_lines":[{"line_number":2,"context_line":"fixes:"},{"line_number":3,"context_line":"  - |"},{"line_number":4,"context_line":"    Fixes the l3 agent graceful stop procedure, allowing HA routers to be"},{"line_number":5,"context_line":"    properly removed using the cleanup_on_shutdown neutron config option."},{"line_number":6,"context_line":"    Previously this option had no effect. It is worth noting that the shutdown"},{"line_number":7,"context_line":"    procedure can take a long time. The timeout can be up to 10 seconds per"},{"line_number":8,"context_line":"    HA router."}],"source_content_type":"text/x-yaml","patch_set":1,"id":"b744737a_f1224ff4","line":5,"range":{"start_line":5,"start_character":31,"end_line":5,"end_character":50},"updated":"2023-02-22 11:44:30.000000000","message":"``cleanup_on_shutdown``","commit_id":"8926abd0d4cc7c18c24a27999332e08fbb8c68df"},{"author":{"_account_id":35264,"name":"Alex Welsh","email":"alex@stackhpc.com","username":"alex-welsh"},"change_message_id":"c914e724c02c37e2a5269d1b4ca21f3dbc43c515","unresolved":false,"context_lines":[{"line_number":2,"context_line":"fixes:"},{"line_number":3,"context_line":"  - |"},{"line_number":4,"context_line":"    Fixes the l3 agent graceful stop procedure, allowing HA routers to be"},{"line_number":5,"context_line":"    properly removed using the cleanup_on_shutdown neutron config option."},{"line_number":6,"context_line":"    Previously this option had no effect. It is worth noting that the shutdown"},{"line_number":7,"context_line":"    procedure can take a long time. The timeout can be up to 10 seconds per"},{"line_number":8,"context_line":"    HA router."}],"source_content_type":"text/x-yaml","patch_set":1,"id":"2f256d49_bbfff609","line":5,"range":{"start_line":5,"start_character":31,"end_line":5,"end_character":50},"in_reply_to":"b744737a_f1224ff4","updated":"2023-02-22 13:56:13.000000000","message":"Done","commit_id":"8926abd0d4cc7c18c24a27999332e08fbb8c68df"}]}
