)]}'
{"/COMMIT_MSG":[{"author":{"_account_id":9708,"name":"Balazs Gibizer","display_name":"gibi","email":"gibizer@gmail.com","username":"gibi"},"change_message_id":"836fb4b754f93dd602dcbcf3e259754a1122a646","unresolved":true,"context_lines":[{"line_number":12,"context_line":"real threads would race and hit:"},{"line_number":13,"context_line":""},{"line_number":14,"context_line":"  sqlite3.OperationalError: cannot start a transaction within a"},{"line_number":15,"context_line":"  transaction"},{"line_number":16,"context_line":""},{"line_number":17,"context_line":"This patch ports the approach Cyborg took to fix the same problem:"},{"line_number":18,"context_line":""}],"source_content_type":"text/x-gerrit-commit-message","patch_set":1,"id":"b395d3fb_0d3b47d6","line":15,"updated":"2026-04-09 14:56:32.000000000","message":"I\u0027m building my understanding from scratch about our transaction within transaction problem in our tests with native threading. \n\nI use the test case nova.tests.unit.policies.test_availability_zone.AZScopeTypeNoLegacyPolicyTest.test_availability_zone_list_policy which can fail on its own (1 in 4 runs) locally for me. \n\nAs far as I see when this test case fails with the error in question it fails as two DB **read** operations overlap. Both trying to read the service list from a **different cell databases** (cell0, cell1).\n\nSo the premise of this fix here about the single write support is questionable. I don\u0027t say the fix itself is wrong, but at least the reasoning is not clear. Or the problem I see and the problem this change is trying to fix is different and we have two independent problems.\n\nUnfortunately as the issue is timing sensitive any change that changes the transaction timing could appear to fix the problem even though it is just making it not happening at the moment or not that frequently.\n\nI continue dig into this test case to understand more but I wanted to document my current understanding and resulting concerns.","commit_id":"8000921b9acdaf644407aefe87c28170168418fc"},{"author":{"_account_id":30002,"name":"Douglas Viroel","email":"viroel@gmail.com","username":"dviroel"},"change_message_id":"f6b18aea0cf1574f8dc75c6321fb70717bb1956e","unresolved":true,"context_lines":[{"line_number":12,"context_line":"real threads would race and hit:"},{"line_number":13,"context_line":""},{"line_number":14,"context_line":"  sqlite3.OperationalError: cannot start a transaction within a"},{"line_number":15,"context_line":"  transaction"},{"line_number":16,"context_line":""},{"line_number":17,"context_line":"This patch ports the approach Cyborg took to fix the same problem:"},{"line_number":18,"context_line":""}],"source_content_type":"text/x-gerrit-commit-message","patch_set":1,"id":"51d67046_37e15dc4","line":15,"in_reply_to":"338c2a7a_8b52732c","updated":"2026-04-09 17:31:33.000000000","message":"I generated with claude a small test that creates a threadpool, where each thread updates an instance state, 5 times, with 1 second sleep between updates. And it is possible to reproduce the \"(sqlite3.OperationalError) cannot start a transaction within a transaction\" error using in-memory MySQL. The test is a multiple write concurrency, but for in-memory MySQL, both read/write would use the same transaction. By moving to a file-backed it should still hit the issue because of the error on locking, triggered by multiple writers. The WAL and MySQL file db should reduce errors, but only serializing the access solved the problem in Watcher when using MySQL.\n\nHere is the code that I used just to reproduce the error in nova:\nhttps://paste.opendev.org/show/bpJTowzoAYTu3fSo1F7i/\n\nIt was easier to reproduce in Watcher since, as Sean mentioned, we have ThreadPools in the Applier that updates/reads multiple Action objects concurrently.","commit_id":"8000921b9acdaf644407aefe87c28170168418fc"},{"author":{"_account_id":11604,"name":"sean mooney","email":"smooney@redhat.com","username":"sean-k-mooney"},"change_message_id":"84c94b70579cf952e66ba3fd5b6154c32b73ee20","unresolved":true,"context_lines":[{"line_number":12,"context_line":"real threads would race and hit:"},{"line_number":13,"context_line":""},{"line_number":14,"context_line":"  sqlite3.OperationalError: cannot start a transaction within a"},{"line_number":15,"context_line":"  transaction"},{"line_number":16,"context_line":""},{"line_number":17,"context_line":"This patch ports the approach Cyborg took to fix the same problem:"},{"line_number":18,"context_line":""}],"source_content_type":"text/x-gerrit-commit-message","patch_set":1,"id":"c856530e_7c2cd628","line":15,"in_reply_to":"4164eae2_93412563","updated":"2026-04-10 12:54:08.000000000","message":"ah well i can perhaps partly answer that\nhttps://paste.opendev.org/show/832593/\n\n\nor i can have ai explain it\ni dug into this wehn we were orginally debuging this but i just had claude update that to explcity adress why this comit works","commit_id":"8000921b9acdaf644407aefe87c28170168418fc"},{"author":{"_account_id":9708,"name":"Balazs Gibizer","display_name":"gibi","email":"gibizer@gmail.com","username":"gibi"},"change_message_id":"7e78ca147def9f50109232f2c6eed994c0a6e563","unresolved":true,"context_lines":[{"line_number":12,"context_line":"real threads would race and hit:"},{"line_number":13,"context_line":""},{"line_number":14,"context_line":"  sqlite3.OperationalError: cannot start a transaction within a"},{"line_number":15,"context_line":"  transaction"},{"line_number":16,"context_line":""},{"line_number":17,"context_line":"This patch ports the approach Cyborg took to fix the same problem:"},{"line_number":18,"context_line":""}],"source_content_type":"text/x-gerrit-commit-message","patch_set":1,"id":"fa052e45_008f6d8a","line":15,"in_reply_to":"51d67046_37e15dc4","updated":"2026-04-10 08:03:41.000000000","message":"Yeah. What you all wrote make sense. What bothers me that a good set of now re-enabled unit tests are not doing parallel write transactions at all, they do parallel reads, and even doing it to two distinct DBs (cell0 and cell1) that was not created by the DatabaseFixture patched here, but the CellDatabases fixture unchanged here. So while I can confirm that this patch makes that problem disappear in these unit tests, I don\u0027t yet understand why and how. So I\u0027m a bit afraid of this magic. :)","commit_id":"8000921b9acdaf644407aefe87c28170168418fc"},{"author":{"_account_id":11604,"name":"sean mooney","email":"smooney@redhat.com","username":"sean-k-mooney"},"change_message_id":"44c8906679a0684f63a64ef0b41966787d8532be","unresolved":true,"context_lines":[{"line_number":12,"context_line":"real threads would race and hit:"},{"line_number":13,"context_line":""},{"line_number":14,"context_line":"  sqlite3.OperationalError: cannot start a transaction within a"},{"line_number":15,"context_line":"  transaction"},{"line_number":16,"context_line":""},{"line_number":17,"context_line":"This patch ports the approach Cyborg took to fix the same problem:"},{"line_number":18,"context_line":""}],"source_content_type":"text/x-gerrit-commit-message","patch_set":1,"id":"338c2a7a_8b52732c","line":15,"in_reply_to":"b395d3fb_0d3b47d6","updated":"2026-04-09 16:20:52.000000000","message":"so the basis of this fix was form error we saw in watcher and cybrog when we disabled eventlet (segfaults in cyborg and  sqlite3.OperationalError: cannot start a transaction within a transaction in wathcer)\n\nin watcher we are suign apschduelr and taskflow which ment we have backgorund writgh transaction happenignin thread pools\n\nim not sayit this will fix the multi cell issue.\n\nthis is specificly to prevent a subset of multi threaded issues that are also present in nova abed on some leaning form \n\nhttps://oldmoe.blog/2024/07/08/the-write-stuff-concurrent-write-transactions-in-sqlite/\nhttps://www.sqlite.org/wal.html\n\ni cant find the link but when we were invstiagting the issue to see if it happend in nova and dough repoduced it in nova with a simple exampel so i was propsoing this to fix those failures not nessiarly any multi cell issues.\n@viroel@gmail.com do you happen to have that still?","commit_id":"8000921b9acdaf644407aefe87c28170168418fc"},{"author":{"_account_id":9708,"name":"Balazs Gibizer","display_name":"gibi","email":"gibizer@gmail.com","username":"gibi"},"change_message_id":"750b1e8cdd3d708a9bf79441c9321fdf8fc5d992","unresolved":true,"context_lines":[{"line_number":12,"context_line":"real threads would race and hit:"},{"line_number":13,"context_line":""},{"line_number":14,"context_line":"  sqlite3.OperationalError: cannot start a transaction within a"},{"line_number":15,"context_line":"  transaction"},{"line_number":16,"context_line":""},{"line_number":17,"context_line":"This patch ports the approach Cyborg took to fix the same problem:"},{"line_number":18,"context_line":""}],"source_content_type":"text/x-gerrit-commit-message","patch_set":1,"id":"4164eae2_93412563","line":15,"in_reply_to":"fa052e45_008f6d8a","updated":"2026-04-10 09:51:01.000000000","message":"So far I was able to boil the magic down to this minifed change https://review.opendev.org/c/openstack/nova/+/983995 But it is still magic how one DB / Engine change affects the other DBs / Engines in the code. Something is global somewhere...","commit_id":"8000921b9acdaf644407aefe87c28170168418fc"}],"/PATCHSET_LEVEL":[{"author":{"_account_id":11604,"name":"sean mooney","email":"smooney@redhat.com","username":"sean-k-mooney"},"change_message_id":"fee6a55c25fe4f393e595ffe97900cb59c7b4700","unresolved":false,"context_lines":[],"source_content_type":"","patch_set":1,"id":"3ffcb6d7_11898ddf","updated":"2026-04-01 17:40:25.000000000","message":"recheck ssl issues caused by load","commit_id":"8000921b9acdaf644407aefe87c28170168418fc"},{"author":{"_account_id":11604,"name":"sean mooney","email":"smooney@redhat.com","username":"sean-k-mooney"},"change_message_id":"1a38554eeca00fd72310fb90798dda221999af80","unresolved":false,"context_lines":[],"source_content_type":"","patch_set":1,"id":"e16c6393_ce61274f","updated":"2026-04-01 20:59:31.000000000","message":"recheck unrelated heat failure on stack name collion in the functoinal sdk job","commit_id":"8000921b9acdaf644407aefe87c28170168418fc"}],"nova/tests/fixtures/nova.py":[{"author":{"_account_id":9708,"name":"Balazs Gibizer","display_name":"gibi","email":"gibizer@gmail.com","username":"gibi"},"change_message_id":"750b1e8cdd3d708a9bf79441c9321fdf8fc5d992","unresolved":true,"context_lines":[{"line_number":687,"context_line":"                self.useFixture("},{"line_number":688,"context_line":"                    db_fixtures.ReplaceEngineFacadeFixture("},{"line_number":689,"context_line":"                        main_db_api.context_manager, new_engine))"},{"line_number":690,"context_line":"                main_db_api.configure(CONF)"},{"line_number":691,"context_line":""},{"line_number":692,"context_line":"                self.get_engine \u003d main_db_api.get_engine"},{"line_number":693,"context_line":"        elif self.database \u003d\u003d \u0027api\u0027:"}],"source_content_type":"text/x-python","patch_set":1,"id":"72b46453_fd013927","side":"PARENT","line":690,"updated":"2026-04-10 09:51:01.000000000","message":"I guess you needed to remove this to be able to manually configure the connection string. But this call did more than defining that. So I\u0027m wondering what DB config now we are not applying.","commit_id":"cfd5474c6438ad7cdb553357a37df5f2150d37ed"},{"author":{"_account_id":9708,"name":"Balazs Gibizer","display_name":"gibi","email":"gibizer@gmail.com","username":"gibi"},"change_message_id":"750b1e8cdd3d708a9bf79441c9321fdf8fc5d992","unresolved":true,"context_lines":[{"line_number":697,"context_line":"            self.useFixture("},{"line_number":698,"context_line":"                db_fixtures.ReplaceEngineFacadeFixture("},{"line_number":699,"context_line":"                    api_db_api.context_manager, new_engine))"},{"line_number":700,"context_line":"            api_db_api.configure(CONF)"},{"line_number":701,"context_line":""},{"line_number":702,"context_line":"            self.get_engine \u003d api_db_api.get_engine"},{"line_number":703,"context_line":""}],"source_content_type":"text/x-python","patch_set":1,"id":"788e4ded_b30e6f42","side":"PARENT","line":700,"updated":"2026-04-10 09:51:01.000000000","message":"ditto","commit_id":"cfd5474c6438ad7cdb553357a37df5f2150d37ed"},{"author":{"_account_id":9708,"name":"Balazs Gibizer","display_name":"gibi","email":"gibizer@gmail.com","username":"gibi"},"change_message_id":"8b35a8c7da18c8aaf28f4d1485154dcbaf976a69","unresolved":true,"context_lines":[{"line_number":579,"context_line":"            serializer\u003dserializer,"},{"line_number":580,"context_line":"            call_monitor_timeout\u003dcall_monitor_timeout)"},{"line_number":581,"context_line":""},{"line_number":582,"context_line":"    def add_cell_database(self, connection_str, default\u003dFalse):"},{"line_number":583,"context_line":"        \"\"\"Add a cell database to the fixture."},{"line_number":584,"context_line":""},{"line_number":585,"context_line":"        :param connection_str: An identifier used to represent the connection"}],"source_content_type":"text/x-python","patch_set":1,"id":"9593f800_c2fcf17f","line":582,"updated":"2026-04-09 16:26:17.000000000","message":"I think this codepath creates the cell0 and cellX main DBs for most of our functional tests.","commit_id":"8000921b9acdaf644407aefe87c28170168418fc"},{"author":{"_account_id":9708,"name":"Balazs Gibizer","display_name":"gibi","email":"gibizer@gmail.com","username":"gibi"},"change_message_id":"750b1e8cdd3d708a9bf79441c9321fdf8fc5d992","unresolved":true,"context_lines":[{"line_number":579,"context_line":"            serializer\u003dserializer,"},{"line_number":580,"context_line":"            call_monitor_timeout\u003dcall_monitor_timeout)"},{"line_number":581,"context_line":""},{"line_number":582,"context_line":"    def add_cell_database(self, connection_str, default\u003dFalse):"},{"line_number":583,"context_line":"        \"\"\"Add a cell database to the fixture."},{"line_number":584,"context_line":""},{"line_number":585,"context_line":"        :param connection_str: An identifier used to represent the connection"}],"source_content_type":"text/x-python","patch_set":1,"id":"63026330_cf548c61","line":582,"in_reply_to":"6da69b41_6ae407a4","updated":"2026-04-10 09:51:01.000000000","message":"These DB fixtures are being used by both unit and functional tests so we need to sync the approach","commit_id":"8000921b9acdaf644407aefe87c28170168418fc"},{"author":{"_account_id":11604,"name":"sean mooney","email":"smooney@redhat.com","username":"sean-k-mooney"},"change_message_id":"cd2ab2c0b2125e5f76c2726f278bdf85ac142c54","unresolved":true,"context_lines":[{"line_number":579,"context_line":"            serializer\u003dserializer,"},{"line_number":580,"context_line":"            call_monitor_timeout\u003dcall_monitor_timeout)"},{"line_number":581,"context_line":""},{"line_number":582,"context_line":"    def add_cell_database(self, connection_str, default\u003dFalse):"},{"line_number":583,"context_line":"        \"\"\"Add a cell database to the fixture."},{"line_number":584,"context_line":""},{"line_number":585,"context_line":"        :param connection_str: An identifier used to represent the connection"}],"source_content_type":"text/x-python","patch_set":1,"id":"6da69b41_6ae407a4","line":582,"in_reply_to":"9593f800_c2fcf17f","updated":"2026-04-09 18:04:39.000000000","message":"ack i had lookd only at the unit test so far so the functional test will likely need addiotnal work\n\ni jsut confirmed that the exsitng fucntional test continued to pass but did nto try turnign off eventlet in them","commit_id":"8000921b9acdaf644407aefe87c28170168418fc"},{"author":{"_account_id":9708,"name":"Balazs Gibizer","display_name":"gibi","email":"gibizer@gmail.com","username":"gibi"},"change_message_id":"8b35a8c7da18c8aaf28f4d1485154dcbaf976a69","unresolved":true,"context_lines":[{"line_number":690,"context_line":"            _locked_scope))"},{"line_number":691,"context_line":""},{"line_number":692,"context_line":""},{"line_number":693,"context_line":"class Database(fixtures.Fixture):"},{"line_number":694,"context_line":""},{"line_number":695,"context_line":"    # TODO(stephenfin): The \u0027version\u0027 argument is unused and can be removed"},{"line_number":696,"context_line":"    def __init__(self, database\u003d\u0027main\u0027, version\u003dNone, connection\u003dNone):"}],"source_content_type":"text/x-python","patch_set":1,"id":"55df0a44_5a9941dc","line":693,"updated":"2026-04-09 16:26:17.000000000","message":"As far as I understand so far this fixture is not used of the main DB (i.e. cell DB) when the CellDatabases fixture is in use to provide a multicell (cell0, cell1) setup for the test case.\n\n@ashigupt@redhat.com has this patch https://review.opendev.org/c/openstack/nova/+/980179 that applies similar startegies like file based SQLite for this CellDatabases fixture. I feel we need a common approach for both fixtures. Could you please join forces with Ashish ironing these things out. \n\n(Will keep building my understanding of the codebase to be more useful in this reviews)","commit_id":"8000921b9acdaf644407aefe87c28170168418fc"},{"author":{"_account_id":9708,"name":"Balazs Gibizer","display_name":"gibi","email":"gibizer@gmail.com","username":"gibi"},"change_message_id":"750b1e8cdd3d708a9bf79441c9321fdf8fc5d992","unresolved":true,"context_lines":[{"line_number":732,"context_line":"                # NOTE(gibi): this injects a new factory for each test and"},{"line_number":733,"context_line":"                # cleans it up at then end of the test case. This way we can"},{"line_number":734,"context_line":"                # let each test configure the factory so we can avoid having a"},{"line_number":735,"context_line":"                # global flag guarding against factory re-configuration."},{"line_number":736,"context_line":"                # Use a file-backed SQLite database with WAL mode so that"},{"line_number":737,"context_line":"                # concurrent readers and a single writer can coexist under"},{"line_number":738,"context_line":"                # native threading."}],"source_content_type":"text/x-python","patch_set":1,"id":"f96f309c_c291de7e","line":735,"updated":"2026-04-10 09:51:01.000000000","message":"I think this comment belong to L746 now","commit_id":"8000921b9acdaf644407aefe87c28170168418fc"},{"author":{"_account_id":9708,"name":"Balazs Gibizer","display_name":"gibi","email":"gibizer@gmail.com","username":"gibi"},"change_message_id":"750b1e8cdd3d708a9bf79441c9321fdf8fc5d992","unresolved":true,"context_lines":[{"line_number":751,"context_line":""},{"line_number":752,"context_line":"                self.get_engine \u003d main_db_api.get_engine"},{"line_number":753,"context_line":"        elif self.database \u003d\u003d \u0027api\u0027:"},{"line_number":754,"context_line":"            # NOTE(gibi): similar note applies here as for the main_db_api"},{"line_number":755,"context_line":"            # above"},{"line_number":756,"context_line":"            fd, db_path \u003d tempfile.mkstemp("},{"line_number":757,"context_line":"                prefix\u003d\u0027nova_test_api_\u0027, suffix\u003d\u0027.db\u0027)"}],"source_content_type":"text/x-python","patch_set":1,"id":"1272b311_177f8308","line":754,"updated":"2026-04-10 09:51:01.000000000","message":"ditto this is trying to explain L763","commit_id":"8000921b9acdaf644407aefe87c28170168418fc"}],"threading_unit_test_excludes.txt":[{"author":{"_account_id":9708,"name":"Balazs Gibizer","display_name":"gibi","email":"gibizer@gmail.com","username":"gibi"},"change_message_id":"688905eba91ace40807bc3f0bcca435e0de4e0ef","unresolved":true,"context_lines":[{"line_number":24,"context_line":"# both triggered at: nova.compute.api.HostAPI._service_get_all_cells"},{"line_number":25,"context_line":"nova.tests.unit.policies.test_availability_zone.AZScopeTypeNoLegacyPolicyTest.test_availability_zone_detail_policy"},{"line_number":26,"context_line":"nova.tests.unit.test_availability_zones.AvailabilityZoneTestCases.test_get_availability_zones"},{"line_number":27,"context_line":"nova.tests.unit.policies.test_availability_zone.AZScopeTypeNoLegacyPolicyTest.test_availability_zone_list_policy"},{"line_number":28,"context_line":"nova.tests.unit.policies.test_availability_zone.AvailabilityZone"},{"line_number":29,"context_line":"nova.tests.unit.compute.test_shelve.ShelveComputeAPITestCase.test_unshelve_without_az_to_newaz_and_host"},{"line_number":30,"context_line":"nova.tests.unit.compute.test_shelve.ShelveComputeAPITestCase.test_unshelve_without_az_to_newaz"}],"source_content_type":"text/plain","patch_set":1,"id":"32d761bb_0596de43","side":"PARENT","line":27,"updated":"2026-04-09 17:26:20.000000000","message":"An extra observation about this test case. This test case can pass while actually hitting the transaction within transaction error. Sometime such error triggers the test case to fail sometimes it does not. :/","commit_id":"cfd5474c6438ad7cdb553357a37df5f2150d37ed"}]}
