)]}' {"/PATCHSET_LEVEL":[{"author":{"_account_id":9531,"name":"liuyulong","display_name":"LIU Yulong","email":"i@liuyulong.me","username":"LIU-Yulong"},"change_message_id":"c21ba6340099449f027efef157f66a143e8f5383","unresolved":false,"context_lines":[],"source_content_type":"","patch_set":3,"id":"fff57929_47152c3a","updated":"2025-01-07 09:25:31.000000000","message":"It\u0027s better to run real loads of routers to test this change. In real production environment, there may be many routers on one single l3-agent. And for l3-agent restart, it will do a full-sync and re-processing of all routers. So one by one processing will result a huge precessing time increase. If such full-sync and re-processing is not donw, any new/update/delete events may not get proceed either.","commit_id":"c4549307c5553b2a043b78aa631630d3dbd7e5de"},{"author":{"_account_id":16688,"name":"Rodolfo Alonso","email":"ralonsoh@redhat.com","username":"rodolfo-alonso-hernandez"},"change_message_id":"826a1900fce42e1217f78ad16e7d09f6bb1e5269","unresolved":false,"context_lines":[],"source_content_type":"","patch_set":3,"id":"986fdc26_7a22fceb","updated":"2025-01-07 11:19:17.000000000","message":"recheck neutron-functional","commit_id":"c4549307c5553b2a043b78aa631630d3dbd7e5de"},{"author":{"_account_id":11583,"name":"Arnaud Morin","email":"arnaud.morin@gmail.com","username":"arnaudmorin"},"change_message_id":"250c46a06b48ae9e68241daeb12d5a0cf24bcbd4","unresolved":false,"context_lines":[],"source_content_type":"","patch_set":3,"id":"c1015ac8_a0bf9dbf","in_reply_to":"2a85b5fd_b60c4584","updated":"2025-01-14 09:30:44.000000000","message":"Ok, thanks for explanation.\nI dont want to be a blocker here, I just want to make sure that this wont break large-scale scenario.\nI would love testing this is my env before going further, but I can\u0027t be sure that I will have time in the next coming days..","commit_id":"c4549307c5553b2a043b78aa631630d3dbd7e5de"},{"author":{"_account_id":16688,"name":"Rodolfo Alonso","email":"ralonsoh@redhat.com","username":"rodolfo-alonso-hernandez"},"change_message_id":"c001a8266547d64b9614f3088a3723d68b089018","unresolved":false,"context_lines":[],"source_content_type":"","patch_set":3,"id":"2a85b5fd_b60c4584","in_reply_to":"4071fb40_996f8716","updated":"2025-01-10 07:11:07.000000000","message":"First of all, the tests you are providing are unit tests. This is not testing at all in a real environment, that is the fair request made by Liu.\n\nYou tests are mocking the ``_process_router_update``, using a hand made method. This method has a time.sleep() in the middle. In an eventlet environment (that also happens in using kernel threads), this is trigger to switch to other thread. Thus when using multiple threads to execute the ``_process_update`` method, you are actually executing very little code and this code is quickly switching to another thread. The time.sleep() wait is executed in parallel.\n\nThis is not happening when processing real loads: the ``_process_update`` method will call the required method (router updated, router deleted, etc) and this thread will never release the GIL until finished.","commit_id":"c4549307c5553b2a043b78aa631630d3dbd7e5de"},{"author":{"_account_id":11583,"name":"Arnaud Morin","email":"arnaud.morin@gmail.com","username":"arnaudmorin"},"change_message_id":"105818e9b204d202b5c6528d7c57cf67501c5a43","unresolved":true,"context_lines":[],"source_content_type":"","patch_set":3,"id":"4071fb40_996f8716","in_reply_to":"8a5751cb_f18c75a6","updated":"2025-01-08 22:32:10.000000000","message":"I did a small reproducer test here [1] and here [2].\n\n1 is above your patch with only one worker\n2 is above master (without your patch)\n\nThe test is pretty simple as it simulate router update processed through either one thread (1) or multiple threads (2).\nWhen using one thread, we see that the processing of updates is done sequentially, and thus taking much more time.\n\nI believe we must keep the fact that l3 agent process router updates in parallel or we will break some large-scale deployments.\n\n\n[1] https://review.opendev.org/c/openstack/neutron/+/938696\n[2] https://review.opendev.org/c/openstack/neutron/+/938694","commit_id":"c4549307c5553b2a043b78aa631630d3dbd7e5de"},{"author":{"_account_id":16688,"name":"Rodolfo Alonso","email":"ralonsoh@redhat.com","username":"rodolfo-alonso-hernandez"},"change_message_id":"c39b2154a9bed27b32bbe1d150681ab41488fe0a","unresolved":false,"context_lines":[],"source_content_type":"","patch_set":3,"id":"8a5751cb_f18c75a6","in_reply_to":"9d5d61d6_4c3ad6a4","updated":"2025-01-07 11:22:16.000000000","message":"So far I\u0027ve tested with 50 routers connected to a gateway network and a private network. I\u0027ve measured the time since the ``periodic_sync_routers_task`` is called, with the message \"Starting fullsync periodic_sync_routers_task\" until the last router is processed, with the message \"Finished a router update for xxxxx\". In my system, both implementations (1 thread, multiple threads) took the same time around 1:30 minutes to process all the 50 routers.\n\nAgain, there is no performance improvement in using multiple threads to process the events.","commit_id":"c4549307c5553b2a043b78aa631630d3dbd7e5de"},{"author":{"_account_id":16688,"name":"Rodolfo Alonso","email":"ralonsoh@redhat.com","username":"rodolfo-alonso-hernandez"},"change_message_id":"c526cfbb0978e370052c0a4121da55b4889892a4","unresolved":false,"context_lines":[],"source_content_type":"","patch_set":3,"id":"9d5d61d6_4c3ad6a4","in_reply_to":"fff57929_47152c3a","updated":"2025-01-07 10:31:11.000000000","message":"It is just a matter of reading the logs of the existing CI jobs. When a processing event is being executed, no other event is executed in parallel. Right after a ``Finished a router xxxxx`` log, there is a ``Starting processing xxxxx`` message for the next event processing. There is no improvement in creating several threads for the event processing as long as the threads are never executed in parallel.","commit_id":"c4549307c5553b2a043b78aa631630d3dbd7e5de"}],"neutron/agent/l3/agent.py":[{"author":{"_account_id":11583,"name":"Arnaud Morin","email":"arnaud.morin@gmail.com","username":"arnaudmorin"},"change_message_id":"15db711cb522fc8525b6ec891e50b7d8a289e0df","unresolved":true,"context_lines":[{"line_number":468,"context_line":" len(self.router_info)])])"},{"line_number":469,"context_line":" if pool_size \u003d\u003d self._pool_size:"},{"line_number":470,"context_line":" return"},{"line_number":471,"context_line":" LOG.info(\"Resizing router processing queue green pool size to: %d\","},{"line_number":472,"context_line":" pool_size)"},{"line_number":473,"context_line":" self._pool.resize(pool_size)"},{"line_number":474,"context_line":" self._pool_size \u003d pool_size"}],"source_content_type":"text/x-python","patch_set":2,"id":"2b235cf0_d92b419a","side":"PARENT","line":471,"updated":"2025-01-07 08:14:11.000000000","message":"I know that I have this log line in my logs from time to time.\n\nI am totally fine with the final change (going through threading instead of eventlet), but I am wondering about the side effect if we switch from multiple greenthreads to on thread.\n\nAre we sure this is free, even at large scale?","commit_id":"0c29e730db2629c084de0c114a0d1e8e6939ac25"},{"author":{"_account_id":16688,"name":"Rodolfo Alonso","email":"ralonsoh@redhat.com","username":"rodolfo-alonso-hernandez"},"change_message_id":"e163f6bcb4f7db2b1ed19125f8f639506adff023","unresolved":false,"context_lines":[{"line_number":468,"context_line":" len(self.router_info)])])"},{"line_number":469,"context_line":" if pool_size \u003d\u003d self._pool_size:"},{"line_number":470,"context_line":" return"},{"line_number":471,"context_line":" LOG.info(\"Resizing router processing queue green pool size to: %d\","},{"line_number":472,"context_line":" pool_size)"},{"line_number":473,"context_line":" self._pool.resize(pool_size)"},{"line_number":474,"context_line":" self._pool_size \u003d pool_size"}],"source_content_type":"text/x-python","patch_set":2,"id":"98e9abf8_a7e09928","side":"PARENT","line":471,"in_reply_to":"2b235cf0_d92b419a","updated":"2025-01-07 08:53:25.000000000","message":"This change has the same rationale as [1]. The python code cannot be executed in parallel, there is no performance improvement on creating multiple threads (green, kernel) that will be executed serially.\n\nYou can compare the ``neutron-ovs-rally-task`` job in this patch with other patches, in particular the router related tasks. The times are very similar, there is not performance impact on this change.\n\n[1]https://review.opendev.org/c/openstack/neutron/+/923626","commit_id":"0c29e730db2629c084de0c114a0d1e8e6939ac25"},{"author":{"_account_id":9531,"name":"liuyulong","display_name":"LIU Yulong","email":"i@liuyulong.me","username":"LIU-Yulong"},"change_message_id":"c21ba6340099449f027efef157f66a143e8f5383","unresolved":false,"context_lines":[{"line_number":468,"context_line":" len(self.router_info)])])"},{"line_number":469,"context_line":" if pool_size \u003d\u003d self._pool_size:"},{"line_number":470,"context_line":" return"},{"line_number":471,"context_line":" LOG.info(\"Resizing router processing queue green pool size to: %d\","},{"line_number":472,"context_line":" pool_size)"},{"line_number":473,"context_line":" self._pool.resize(pool_size)"},{"line_number":474,"context_line":" self._pool_size \u003d pool_size"}],"source_content_type":"text/x-python","patch_set":2,"id":"a6ca1c3c_ba04682f","side":"PARENT","line":471,"in_reply_to":"6dfb5b09_e775c0d7","updated":"2025-01-07 09:25:31.000000000","message":"Single threading reminds me of many old bugs. If one of the multiple routers has a slow processing speed, for example, it has many routes, floating ips and port forwardings, then the subsequent routers will have to wait for this router to complete the processing before they can start processing, logs show such waits:\n\nhttps://22d7ebcac49e3be1b76c-c65fde0f0ae8487399d98fc75a829061.ssl.cf5.rackcdn.com/938406/2/check/neutron-tempest-plugin-openvswitch/c7896ec/controller/logs/screen-q-l3.txt\n\nJan 03 15:51:11.970233 np0039482942 neutron-l3-agent[64212]: INFO neutron.agent.l3.agent [-] Finished a router update for 1046aa2f-2ed2-40e1-aa6a-5d4ac22a920b, update_id 7e6be77f-725e-4c6a-9c48-b69b6774f66e. Time elapsed: 5.289\nJan 03 15:51:11.970479 np0039482942 neutron-l3-agent[64212]: INFO neutron.agent.l3.agent [-] Starting processing update 16b4518f-dfd3-4d9f-9674-f2a85ebb134d, action 3, priority 1, update_id 17766713-931a-42b1-9661-0e3f9bcf1a39. Wait time elapsed: 5.266\nJan 03 15:51:11.970551 np0039482942 neutron-l3-agent[64212]: INFO neutron.agent.l3.agent [-] Starting router update for 16b4518f-dfd3-4d9f-9674-f2a85ebb134d, action 3, priority 1, update_id 17766713-931a-42b1-9661-0e3f9bcf1a39. Wait time elapsed: 5.266","commit_id":"0c29e730db2629c084de0c114a0d1e8e6939ac25"},{"author":{"_account_id":9531,"name":"liuyulong","display_name":"LIU Yulong","email":"i@liuyulong.me","username":"LIU-Yulong"},"change_message_id":"eb16f584806aefc08f782c527aca64ddd89b7852","unresolved":false,"context_lines":[{"line_number":468,"context_line":" len(self.router_info)])])"},{"line_number":469,"context_line":" if pool_size \u003d\u003d self._pool_size:"},{"line_number":470,"context_line":" return"},{"line_number":471,"context_line":" LOG.info(\"Resizing router processing queue green pool size to: %d\","},{"line_number":472,"context_line":" pool_size)"},{"line_number":473,"context_line":" self._pool.resize(pool_size)"},{"line_number":474,"context_line":" self._pool_size \u003d pool_size"}],"source_content_type":"text/x-python","patch_set":2,"id":"6dfb5b09_e775c0d7","side":"PARENT","line":471,"in_reply_to":"98e9abf8_a7e09928","updated":"2025-01-07 09:02:21.000000000","message":"rally is used for API performance test. It\u0027s better to check the l3-agent logs. For instance, create 100-200 routers on one singe l3-agent, then verify the router processing total time.","commit_id":"0c29e730db2629c084de0c114a0d1e8e6939ac25"},{"author":{"_account_id":14525,"name":"Vasyl Saienko","email":"vsaienko@mirantis.com","username":"vsaienko"},"change_message_id":"da3d8554c680c4e8cf40f92880ceebf3359ee15f","unresolved":true,"context_lines":[{"line_number":323,"context_line":" self.metadata_driver)"},{"line_number":324,"context_line":""},{"line_number":325,"context_line":" # L3 agent router processing green pool"},{"line_number":326,"context_line":" self._pool \u003d eventlet.GreenPool(size\u003d1)"},{"line_number":327,"context_line":" self._queue \u003d queue.ResourceProcessingQueue()"},{"line_number":328,"context_line":" super().__init__(host\u003dself.conf.host)"},{"line_number":329,"context_line":""}],"source_content_type":"text/x-python","patch_set":3,"id":"6061ffea_be4a36de","line":326,"range":{"start_line":326,"start_character":40,"end_line":326,"end_character":47},"updated":"2025-01-14 12:23:18.000000000","message":"actually this patch as is will limit concurrency for router handling to 1. And it will deffinetly slow down keepalived spawn processing. But as far I see later in this chain concurrency is restored by https://review.opendev.org/c/openstack/neutron/+/938411 so if we land them together it should be fine.","commit_id":"c4549307c5553b2a043b78aa631630d3dbd7e5de"},{"author":{"_account_id":16688,"name":"Rodolfo Alonso","email":"ralonsoh@redhat.com","username":"rodolfo-alonso-hernandez"},"change_message_id":"49259cbce1c4c6f91bc44b94f610c44ea3b64469","unresolved":false,"context_lines":[{"line_number":323,"context_line":" self.metadata_driver)"},{"line_number":324,"context_line":""},{"line_number":325,"context_line":" # L3 agent router processing green pool"},{"line_number":326,"context_line":" self._pool \u003d eventlet.GreenPool(size\u003d1)"},{"line_number":327,"context_line":" self._queue \u003d queue.ResourceProcessingQueue()"},{"line_number":328,"context_line":" super().__init__(host\u003dself.conf.host)"},{"line_number":329,"context_line":""}],"source_content_type":"text/x-python","patch_set":3,"id":"73242282_f630ef85","line":326,"range":{"start_line":326,"start_character":40,"end_line":326,"end_character":47},"in_reply_to":"5533c1eb_a24126f2","updated":"2025-02-05 21:50:28.000000000","message":"Perfect. I\u0027ll keep the concurrency. Any further problem with that once removed eventlet will be supported by you, I guess.","commit_id":"c4549307c5553b2a043b78aa631630d3dbd7e5de"},{"author":{"_account_id":16688,"name":"Rodolfo Alonso","email":"ralonsoh@redhat.com","username":"rodolfo-alonso-hernandez"},"change_message_id":"5a159561e538257a7ed80e559a4af58b28fb1474","unresolved":false,"context_lines":[{"line_number":323,"context_line":" self.metadata_driver)"},{"line_number":324,"context_line":""},{"line_number":325,"context_line":" # L3 agent router processing green pool"},{"line_number":326,"context_line":" self._pool \u003d eventlet.GreenPool(size\u003d1)"},{"line_number":327,"context_line":" self._queue \u003d queue.ResourceProcessingQueue()"},{"line_number":328,"context_line":" super().__init__(host\u003dself.conf.host)"},{"line_number":329,"context_line":""}],"source_content_type":"text/x-python","patch_set":3,"id":"de422ab7_ab27a879","line":326,"range":{"start_line":326,"start_character":40,"end_line":326,"end_character":47},"in_reply_to":"6061ffea_be4a36de","updated":"2025-01-14 12:25:15.000000000","message":"Why \"it will deffinetly slow down keepalived spawn processing\"? When the router processing code was executed in parallel?","commit_id":"c4549307c5553b2a043b78aa631630d3dbd7e5de"},{"author":{"_account_id":14525,"name":"Vasyl Saienko","email":"vsaienko@mirantis.com","username":"vsaienko"},"change_message_id":"46c177910c760dbf7edd20dd67c0b8f1b606d7d1","unresolved":false,"context_lines":[{"line_number":323,"context_line":" self.metadata_driver)"},{"line_number":324,"context_line":""},{"line_number":325,"context_line":" # L3 agent router processing green pool"},{"line_number":326,"context_line":" self._pool \u003d eventlet.GreenPool(size\u003d1)"},{"line_number":327,"context_line":" self._queue \u003d queue.ResourceProcessingQueue()"},{"line_number":328,"context_line":" super().__init__(host\u003dself.conf.host)"},{"line_number":329,"context_line":""}],"source_content_type":"text/x-python","patch_set":3,"id":"5533c1eb_a24126f2","line":326,"range":{"start_line":326,"start_character":40,"end_line":326,"end_character":47},"in_reply_to":"de422ab7_ab27a879","updated":"2025-01-14 13:32:25.000000000","message":"here https://review.opendev.org/c/openstack/neutron/+/938406/3/neutron/agent/l3/agent.py#849 we start more threads that process router updates from the queue.\n\nThere you can see the difference, with default 32 pools size and with 1. In second case unless we finished 1st router we will not start processing 2nd. In first case we start processing all routers in parallel. Even on 8 routers it shows significant degradation (the same node was used for tests)\n\nhttps://paste.openstack.org/show/bpKO6tVfWvS4fKy9h2w7/","commit_id":"c4549307c5553b2a043b78aa631630d3dbd7e5de"}]}