)]}'
{"doc/source/contributor/design/database_consistency.rst":[{"author":{"_account_id":23804,"name":"Daniel Alvarez","email":"dalvarez@redhat.com","username":"dalvarez"},"change_message_id":"d3edb22b66b837922797834b1be03e4b35dc596d","unresolved":false,"context_lines":[{"line_number":84,"context_line":"When a request to update/create/delete a certain resource arrives,"},{"line_number":85,"context_line":"the Neutron API worker that is handling this task should announce its"},{"line_number":86,"context_line":"*intents* to do it by creating a lock record based on that resource"},{"line_number":87,"context_line":"``UUID`` on a separated table in the Neutron database. While these locks"},{"line_number":88,"context_line":"are held, any other worker that attempts to modify the same resource"},{"line_number":89,"context_line":"will have to wait."},{"line_number":90,"context_line":""},{"line_number":91,"context_line":"This mechanism requires a new table to be created in the Neutron"},{"line_number":92,"context_line":"database. The proposed model is:"}],"source_content_type":"text/x-rst","patch_set":1,"id":"df3967d1_9c48312d","line":89,"range":{"start_line":87,"start_character":55,"end_line":89,"end_character":18},"updated":"2017-08-04 12:07:50.000000000","message":"This could have a performance hit and reduce parallelism? This is already handled in Neutron using revision numbers right?","commit_id":"cdfcd637253563e81adc4369b0e08da0b3fee22d"},{"author":{"_account_id":6773,"name":"Lucas Alvares Gomes","email":"lucasagomes@gmail.com","username":"lucasagomes"},"change_message_id":"879898858ab56ef39ea7d903ee863e2e96c9916f","unresolved":false,"context_lines":[{"line_number":84,"context_line":"When a request to update/create/delete a certain resource arrives,"},{"line_number":85,"context_line":"the Neutron API worker that is handling this task should announce its"},{"line_number":86,"context_line":"*intents* to do it by creating a lock record based on that resource"},{"line_number":87,"context_line":"``UUID`` on a separated table in the Neutron database. While these locks"},{"line_number":88,"context_line":"are held, any other worker that attempts to modify the same resource"},{"line_number":89,"context_line":"will have to wait."},{"line_number":90,"context_line":""},{"line_number":91,"context_line":"This mechanism requires a new table to be created in the Neutron"},{"line_number":92,"context_line":"database. The proposed model is:"}],"source_content_type":"text/x-rst","patch_set":1,"id":"df3967d1_73bff8a9","line":89,"range":{"start_line":87,"start_character":55,"end_line":89,"end_character":18},"in_reply_to":"df3967d1_0db76988","updated":"2017-08-04 15:14:31.000000000","message":"I agree, I will ask terry if he could review these bits of the spec for new ideas. My experience dealing with IDLs and knowing that it\u0027s yet another database level (3 dbs!) makes me think that having a layer which I can serialize the requests to these IDLs and not rely on them for consistency would be a good thing... But, even if we do it this way and later on we find out that the lock is not needed anymore it should be easy enough to just get rid of it by removing a decorator and deleting the code from the tree","commit_id":"cdfcd637253563e81adc4369b0e08da0b3fee22d"},{"author":{"_account_id":6773,"name":"Lucas Alvares Gomes","email":"lucasagomes@gmail.com","username":"lucasagomes"},"change_message_id":"f7682b99f893df867b4dcac3827a4a3d5b2bd1cd","unresolved":false,"context_lines":[{"line_number":84,"context_line":"When a request to update/create/delete a certain resource arrives,"},{"line_number":85,"context_line":"the Neutron API worker that is handling this task should announce its"},{"line_number":86,"context_line":"*intents* to do it by creating a lock record based on that resource"},{"line_number":87,"context_line":"``UUID`` on a separated table in the Neutron database. While these locks"},{"line_number":88,"context_line":"are held, any other worker that attempts to modify the same resource"},{"line_number":89,"context_line":"will have to wait."},{"line_number":90,"context_line":""},{"line_number":91,"context_line":"This mechanism requires a new table to be created in the Neutron"},{"line_number":92,"context_line":"database. The proposed model is:"}],"source_content_type":"text/x-rst","patch_set":1,"id":"df3967d1_df10eb02","line":89,"range":{"start_line":87,"start_character":55,"end_line":89,"end_character":18},"in_reply_to":"df3967d1_9c48312d","updated":"2017-08-04 13:04:23.000000000","message":"That\u0027s right, it will reduce parallelism (and consequently it will have some performance impact) when updating the same row in the database, those will be serialized. But, we still allow concurrency for everything else.\n\nWe could consider have no lock at all and handle everything just using the revision_number maybe, I\u0027m just not sure how that would work with the IDL model that we have right now, because that\u0027s an extra level of database (in-memory replica) that we have with OVN. Maybe @Terry will have a better idea, but it seems possible to have two IDLs out of sync as per commit https://review.openstack.org/#/c/404791/","commit_id":"cdfcd637253563e81adc4369b0e08da0b3fee22d"},{"author":{"_account_id":23804,"name":"Daniel Alvarez","email":"dalvarez@redhat.com","username":"dalvarez"},"change_message_id":"87ec8f2e80d3fdfb0b4c26247462693ec55db48e","unresolved":false,"context_lines":[{"line_number":84,"context_line":"When a request to update/create/delete a certain resource arrives,"},{"line_number":85,"context_line":"the Neutron API worker that is handling this task should announce its"},{"line_number":86,"context_line":"*intents* to do it by creating a lock record based on that resource"},{"line_number":87,"context_line":"``UUID`` on a separated table in the Neutron database. While these locks"},{"line_number":88,"context_line":"are held, any other worker that attempts to modify the same resource"},{"line_number":89,"context_line":"will have to wait."},{"line_number":90,"context_line":""},{"line_number":91,"context_line":"This mechanism requires a new table to be created in the Neutron"},{"line_number":92,"context_line":"database. The proposed model is:"}],"source_content_type":"text/x-rst","patch_set":1,"id":"df3967d1_0db76988","line":89,"range":{"start_line":87,"start_character":55,"end_line":89,"end_character":18},"in_reply_to":"df3967d1_df10eb02","updated":"2017-08-04 14:22:34.000000000","message":"Yes, maybe only queries updating the same resources is not very representative out of the total of simultaneous queries we may have and it\u0027s not a big performance hit. If we could do it though just through the revision_number that\u0027d be great... maybe @Terry comes with some magic trick :)","commit_id":"cdfcd637253563e81adc4369b0e08da0b3fee22d"},{"author":{"_account_id":23804,"name":"Daniel Alvarez","email":"dalvarez@redhat.com","username":"dalvarez"},"change_message_id":"d3edb22b66b837922797834b1be03e4b35dc596d","unresolved":false,"context_lines":[{"line_number":165,"context_line":"~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~"},{"line_number":166,"context_line":""},{"line_number":167,"context_line":"Another idea to make the database comparison between the"},{"line_number":168,"context_line":"``revivision_number``\u0027s more efficient would be to create an additional"},{"line_number":169,"context_line":"table in the Neutron database that keeps track of the OVN resources"},{"line_number":170,"context_line":"versions, something like a \"ovn_standardattributes\" table. Every time"},{"line_number":171,"context_line":"we successfully create or update a resource in OVN the corresponding"}],"source_content_type":"text/x-rst","patch_set":1,"id":"df3967d1_bc4d353f","line":168,"range":{"start_line":168,"start_character":2,"end_line":168,"end_character":19},"updated":"2017-08-04 12:07:50.000000000","message":"nit: revision_number","commit_id":"cdfcd637253563e81adc4369b0e08da0b3fee22d"},{"author":{"_account_id":6773,"name":"Lucas Alvares Gomes","email":"lucasagomes@gmail.com","username":"lucasagomes"},"change_message_id":"f7682b99f893df867b4dcac3827a4a3d5b2bd1cd","unresolved":false,"context_lines":[{"line_number":165,"context_line":"~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~"},{"line_number":166,"context_line":""},{"line_number":167,"context_line":"Another idea to make the database comparison between the"},{"line_number":168,"context_line":"``revivision_number``\u0027s more efficient would be to create an additional"},{"line_number":169,"context_line":"table in the Neutron database that keeps track of the OVN resources"},{"line_number":170,"context_line":"versions, something like a \"ovn_standardattributes\" table. Every time"},{"line_number":171,"context_line":"we successfully create or update a resource in OVN the corresponding"}],"source_content_type":"text/x-rst","patch_set":1,"id":"df3967d1_9ff6e3d6","line":168,"range":{"start_line":168,"start_character":2,"end_line":168,"end_character":19},"in_reply_to":"df3967d1_bc4d353f","updated":"2017-08-04 13:04:23.000000000","message":"Done","commit_id":"cdfcd637253563e81adc4369b0e08da0b3fee22d"},{"author":{"_account_id":23804,"name":"Daniel Alvarez","email":"dalvarez@redhat.com","username":"dalvarez"},"change_message_id":"d3edb22b66b837922797834b1be03e4b35dc596d","unresolved":false,"context_lines":[{"line_number":180,"context_line":"in OVN. That will make it look like that both databases are inconsistent"},{"line_number":181,"context_line":"but, as the real version will also be saved into the ``external_ids``"},{"line_number":182,"context_line":"column of that resource in the OVN database, the thread responsible for"},{"line_number":183,"context_line":"fixing the issue could easily detect this problem."},{"line_number":184,"context_line":""},{"line_number":185,"context_line":"The additional table schema could be something like this:"},{"line_number":186,"context_line":""}],"source_content_type":"text/x-rst","patch_set":1,"id":"df3967d1_fc261d68","line":183,"updated":"2017-08-04 12:07:50.000000000","message":"If we lose connectivity with ovsdb-server then we\u0027ll be unable to write the external_id right? Not sure how we\u0027re handling this right now, it would leave inconsistencies i think.","commit_id":"cdfcd637253563e81adc4369b0e08da0b3fee22d"},{"author":{"_account_id":6773,"name":"Lucas Alvares Gomes","email":"lucasagomes@gmail.com","username":"lucasagomes"},"change_message_id":"8e9bf43a232cc84900f544fbe77b4aef86d03c03","unresolved":false,"context_lines":[{"line_number":180,"context_line":"in OVN. That will make it look like that both databases are inconsistent"},{"line_number":181,"context_line":"but, as the real version will also be saved into the ``external_ids``"},{"line_number":182,"context_line":"column of that resource in the OVN database, the thread responsible for"},{"line_number":183,"context_line":"fixing the issue could easily detect this problem."},{"line_number":184,"context_line":""},{"line_number":185,"context_line":"The additional table schema could be something like this:"},{"line_number":186,"context_line":""}],"source_content_type":"text/x-rst","patch_set":1,"id":"df3967d1_c2c810a7","line":183,"in_reply_to":"df3967d1_1f955333","updated":"2017-08-04 13:29:24.000000000","message":"Actually not, ignore my comment above. If we lose connectivity with the ovsdb-server we won\u0027t be able to update/create/delete the resource either (the external_id update will be part of the same transaction). Consequently we won\u0027t update the neutron database leaving it with the old revision number. And it will later be fixed by the thread doing the maintenance.","commit_id":"cdfcd637253563e81adc4369b0e08da0b3fee22d"},{"author":{"_account_id":23804,"name":"Daniel Alvarez","email":"dalvarez@redhat.com","username":"dalvarez"},"change_message_id":"87ec8f2e80d3fdfb0b4c26247462693ec55db48e","unresolved":false,"context_lines":[{"line_number":180,"context_line":"in OVN. That will make it look like that both databases are inconsistent"},{"line_number":181,"context_line":"but, as the real version will also be saved into the ``external_ids``"},{"line_number":182,"context_line":"column of that resource in the OVN database, the thread responsible for"},{"line_number":183,"context_line":"fixing the issue could easily detect this problem."},{"line_number":184,"context_line":""},{"line_number":185,"context_line":"The additional table schema could be something like this:"},{"line_number":186,"context_line":""}],"source_content_type":"text/x-rst","patch_set":1,"id":"df3967d1_ed6e65fe","line":183,"in_reply_to":"df3967d1_c2c810a7","updated":"2017-08-04 14:22:34.000000000","message":"Right!","commit_id":"cdfcd637253563e81adc4369b0e08da0b3fee22d"},{"author":{"_account_id":6773,"name":"Lucas Alvares Gomes","email":"lucasagomes@gmail.com","username":"lucasagomes"},"change_message_id":"f7682b99f893df867b4dcac3827a4a3d5b2bd1cd","unresolved":false,"context_lines":[{"line_number":180,"context_line":"in OVN. That will make it look like that both databases are inconsistent"},{"line_number":181,"context_line":"but, as the real version will also be saved into the ``external_ids``"},{"line_number":182,"context_line":"column of that resource in the OVN database, the thread responsible for"},{"line_number":183,"context_line":"fixing the issue could easily detect this problem."},{"line_number":184,"context_line":""},{"line_number":185,"context_line":"The additional table schema could be something like this:"},{"line_number":186,"context_line":""}],"source_content_type":"text/x-rst","patch_set":1,"id":"df3967d1_1f955333","line":183,"in_reply_to":"df3967d1_fc261d68","updated":"2017-08-04 13:04:23.000000000","message":"That\u0027s correct, if we lose connectivity to ovsdb-server and fail to update the neutron table we could possibly end up with some inconsistencies here.\n\nPerhaps, one way of solving this is having two levels of recovery:\n\n1) By looking at the new \"ovn_standardattributes\" table as proposed here, this would run more often since the operation to find out the inconsistencies is not heavy (it\u0027s a SELECT ... JOIN).\n\n2) From time to time we would still run a full comparison between both databases as per L152. That would run less often since it could be pretty costy depending on the size of the databases.\n\nWhat do you think ?","commit_id":"cdfcd637253563e81adc4369b0e08da0b3fee22d"},{"author":{"_account_id":6773,"name":"Lucas Alvares Gomes","email":"lucasagomes@gmail.com","username":"lucasagomes"},"change_message_id":"1f07613413906c350705ea69da6a8882ef31e777","unresolved":false,"context_lines":[{"line_number":281,"context_line":"* Given that the 1 journal thread per Neutron API worker approach"},{"line_number":282,"context_line":"  is problematic, determining the right number of journal threads can"},{"line_number":283,"context_line":"  also be difficult. In my tests, I\u0027ve noticed that 3 journal threads on"},{"line_number":284,"context_line":"  each compute worked better but that number was pure based on ``trial"},{"line_number":285,"context_line":"  \u0026 error``. On a production environment this number should probably be"},{"line_number":286,"context_line":"  calculated automatically based on the environment itself (maybe something"},{"line_number":287,"context_line":"  like `TripleO \u003chttp://tripleo.org/\u003e`_ could do it)."}],"source_content_type":"text/x-rst","patch_set":4,"id":"9f436f4f_efbba20b","line":284,"range":{"start_line":284,"start_character":7,"end_line":284,"end_character":14},"updated":"2017-08-08 22:26:18.000000000","message":"controller","commit_id":"37c0364160b716673bac9a35d6d70658f62dc918"},{"author":{"_account_id":10237,"name":"Numan Siddique","email":"nusiddiq@redhat.com","username":"numansiddique"},"change_message_id":"60426413ae432dd450e3bf4f16c5b6e4fbdbf5ec","unresolved":false,"context_lines":[{"line_number":212,"context_line":"Other notes"},{"line_number":213,"context_line":"-----------"},{"line_number":214,"context_line":""},{"line_number":215,"context_line":"One could point out that having a `distributed lock \u003csolution_1_\u003e`_ is"},{"line_number":216,"context_line":"not needed since we already have the ``revision_number`` registered in the"},{"line_number":217,"context_line":"OVN database and even if the ``postcommit`` method is invoked in the wrong"},{"line_number":218,"context_line":"order we would still be able to to compare the ``revision_number``\u0027s and"}],"source_content_type":"text/x-rst","patch_set":5,"id":"9f436f4f_f2181082","line":215,"updated":"2017-08-09 15:55:18.000000000","message":"Thanks for this paragraph. I had exactly the same doubt.","commit_id":"bd35ab39d9e741e823dfac5fb8a400d2fa3b03ff"},{"author":{"_account_id":10237,"name":"Numan Siddique","email":"nusiddiq@redhat.com","username":"numansiddique"},"change_message_id":"60426413ae432dd450e3bf4f16c5b6e4fbdbf5ec","unresolved":false,"context_lines":[{"line_number":282,"context_line":"  also difficult. In my tests, I\u0027ve noticed that 3 journal threads"},{"line_number":283,"context_line":"  per controller worked better but that number was pure based on"},{"line_number":284,"context_line":"  ``trial \u0026 error``. In production this number should probably be"},{"line_number":285,"context_line":"  calculated based in the environment, perhaps sometihng like `TripleO"},{"line_number":286,"context_line":"  \u003chttp://tripleo.org\u003e`_ (or any upper layer) would be in a better"},{"line_number":287,"context_line":"  position to make that decision."},{"line_number":288,"context_line":""}],"source_content_type":"text/x-rst","patch_set":5,"id":"9f436f4f_e62615b8","line":285,"range":{"start_line":285,"start_character":47,"end_line":285,"end_character":56},"updated":"2017-08-09 15:55:18.000000000","message":"nit: typo","commit_id":"bd35ab39d9e741e823dfac5fb8a400d2fa3b03ff"},{"author":{"_account_id":11816,"name":"mike_mp@zzzcomputing.com","display_name":"Mike Bayer","email":"mike_mp@zzzcomputing.com","username":"zzzeek","status":"Red Hat"},"change_message_id":"d0fe06e82aa63b35becac1116ad0f2eec3d99a63","unresolved":false,"context_lines":[{"line_number":104,"context_line":"resource_uuid  String    Primary key"},{"line_number":105,"context_line":"request_uuid   String    Unique identifier for the request"},{"line_number":106,"context_line":"acquired       Boolean   True if it\u0027s locked. False otherwise"},{"line_number":107,"context_line":"created_at     DateTime  The time that the lock was created. Used for the TTL"},{"line_number":108,"context_line":"\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d  \u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d  \u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d"},{"line_number":109,"context_line":""},{"line_number":110,"context_line":"The pseudo-code will look something like this:"}],"source_content_type":"text/x-rst","patch_set":6,"id":"9f436f4f_5134ecfb","line":107,"updated":"2017-08-15 15:05:44.000000000","message":"add an \"updated_at\" as well assuming you\u0027re going to flip that \"acquired\" flag to False, good for forensic work, performance monitoring (\"how long do locks take?\")\n\nalso I haven\u0027t read further yet I would assume that the state machine for \"acquired\" is:   (no row) -\u003e (insert row acquired\u003dTrue) -\u003e (update row acquired\u003dFalse) -\u003e (DONE)\n\nif so, maybe even name the timestamps \"lock_acquired_at\" / \"lock_released_at\"","commit_id":"e34e5acf27a86934bbb8ee47e59806c617099870"},{"author":{"_account_id":6773,"name":"Lucas Alvares Gomes","email":"lucasagomes@gmail.com","username":"lucasagomes"},"change_message_id":"5843033f4002fe05ed52c6e4eff276041c94f785","unresolved":false,"context_lines":[{"line_number":104,"context_line":"resource_uuid  String    Primary key"},{"line_number":105,"context_line":"request_uuid   String    Unique identifier for the request"},{"line_number":106,"context_line":"acquired       Boolean   True if it\u0027s locked. False otherwise"},{"line_number":107,"context_line":"created_at     DateTime  The time that the lock was created. Used for the TTL"},{"line_number":108,"context_line":"\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d  \u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d  \u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d"},{"line_number":109,"context_line":""},{"line_number":110,"context_line":"The pseudo-code will look something like this:"}],"source_content_type":"text/x-rst","patch_set":6,"id":"9f436f4f_ab4152ab","line":107,"in_reply_to":"9f436f4f_5134ecfb","updated":"2017-08-16 08:30:48.000000000","message":"Great point about updated_at (or lock_{acquired, released}_at), I will extend the table.\n\nRegarding the second question as how the lock behaves. The spec proposes something different, in the ML2 driver in neutron each action to a resources has two phases: precommit and postcommit.\n\nWhat the spec proposes is to pre-create the lock at the precommit stage (within the same transaction as the neutron database). And on the postcommit we would acquire the lock in the same order that it was committed in the neutron database (based on the resource uuid and the created_at timestamp). So it would basically serialize the conflicting requests to the same resource.\n\nOne thing that I noticed now that I\u0027ve been working on a PoC for this spec is that this idea will only work for the ML2 resources. The L3 resources (routers, floating ips, etc...) are computed differently, there\u0027s no precommit() or postcommit() phase for those. So, I might change the spec to something like you have suggest there so it will be behave consistently across all resources.","commit_id":"e34e5acf27a86934bbb8ee47e59806c617099870"},{"author":{"_account_id":11816,"name":"mike_mp@zzzcomputing.com","display_name":"Mike Bayer","email":"mike_mp@zzzcomputing.com","username":"zzzeek","status":"Red Hat"},"change_message_id":"d0fe06e82aa63b35becac1116ad0f2eec3d99a63","unresolved":false,"context_lines":[{"line_number":119,"context_line":"    def update_port_postcommit:"},{"line_number":120,"context_line":"        port \u003d neutron_db.get_port()"},{"line_number":121,"context_line":"        update_port_in_ovn(port)"},{"line_number":122,"context_line":""},{"line_number":123,"context_line":".. note::"},{"line_number":124,"context_line":"  This lock mechanism is quite similar to the `task queue"},{"line_number":125,"context_line":"  \u003chttps://review.openstack.org/#/c/358447\u003e`_ idea that was proposed"}],"source_content_type":"text/x-rst","patch_set":6,"id":"9f436f4f_ccb16349","line":122,"updated":"2017-08-15 15:05:44.000000000","message":"wait...doesn\u0027t neutron initiate this activity overall, and then calls into ml2?   why wouldn\u0027t neutron activate the db lock?   ML2 here would seek to participate in an existing lock initiated by neutron.","commit_id":"e34e5acf27a86934bbb8ee47e59806c617099870"},{"author":{"_account_id":6773,"name":"Lucas Alvares Gomes","email":"lucasagomes@gmail.com","username":"lucasagomes"},"change_message_id":"5843033f4002fe05ed52c6e4eff276041c94f785","unresolved":false,"context_lines":[{"line_number":119,"context_line":"    def update_port_postcommit:"},{"line_number":120,"context_line":"        port \u003d neutron_db.get_port()"},{"line_number":121,"context_line":"        update_port_in_ovn(port)"},{"line_number":122,"context_line":""},{"line_number":123,"context_line":".. note::"},{"line_number":124,"context_line":"  This lock mechanism is quite similar to the `task queue"},{"line_number":125,"context_line":"  \u003chttps://review.openstack.org/#/c/358447\u003e`_ idea that was proposed"}],"source_content_type":"text/x-rst","patch_set":6,"id":"9f436f4f_cbb93679","line":122,"in_reply_to":"9f436f4f_ccb16349","updated":"2017-08-16 08:30:48.000000000","message":"Not really, these are the methods already invoked by Neutron. AFAICT, there\u0027s not such generic locking mechanism in Neutron that we can use in the ML2 drivers for now. Each ML2 driver (odl, bigswitch, etc...) is solving this database sync problem differently and it\u0027s no different for OVN [1].\n\nIt\u0027s not a happy statement but, I hope that in the future when multiple ML2 drivers sees that this is a common enough problem we could come up with a better solution within neutron for all of them.\n\n[1] I\u0027ve attempted to implement something similar to what ODL does but I wasn\u0027t too happy with it. So, this spec is an alternative implementation.","commit_id":"e34e5acf27a86934bbb8ee47e59806c617099870"},{"author":{"_account_id":11762,"name":"Han Zhou","email":"zhouhan@gmail.com","username":"hanzhou"},"change_message_id":"9861bea39ee6d53586b9ff36450429a1334b5912","unresolved":false,"context_lines":[{"line_number":123,"context_line":".. note::"},{"line_number":124,"context_line":"  This lock mechanism is quite similar to the `task queue"},{"line_number":125,"context_line":"  \u003chttps://review.openstack.org/#/c/358447\u003e`_ idea that was proposed"},{"line_number":126,"context_line":"  in the past."},{"line_number":127,"context_line":""},{"line_number":128,"context_line":"The idea is to use a function in the ``precommit`` stage"},{"line_number":129,"context_line":"(``register_lock_for_resource``) to register in the database the order"}],"source_content_type":"text/x-rst","patch_set":6,"id":"9f436f4f_3033b8f8","line":126,"updated":"2017-08-14 21:03:52.000000000","message":"Lucas, thanks for mentioning it. I would like to point out that even with this approach, there is still a need for handling dependencies. An example of conflicting address-set update is reported in bug 1611852 [1]. And a RFC patch following the task queue patch was proposed here [2]. There could be other example of dependencies which are not yet reported, but we could use similar way to handle.\n\n[1] https://bugs.launchpad.net/networking-ovn/+bug/1611852\n\n[2] https://review.openstack.org/#/c/362494/1","commit_id":"e34e5acf27a86934bbb8ee47e59806c617099870"},{"author":{"_account_id":6773,"name":"Lucas Alvares Gomes","email":"lucasagomes@gmail.com","username":"lucasagomes"},"change_message_id":"0cf4d24b0b75838b2055329d113d1caa6aa22a26","unresolved":false,"context_lines":[{"line_number":123,"context_line":".. note::"},{"line_number":124,"context_line":"  This lock mechanism is quite similar to the `task queue"},{"line_number":125,"context_line":"  \u003chttps://review.openstack.org/#/c/358447\u003e`_ idea that was proposed"},{"line_number":126,"context_line":"  in the past."},{"line_number":127,"context_line":""},{"line_number":128,"context_line":"The idea is to use a function in the ``precommit`` stage"},{"line_number":129,"context_line":"(``register_lock_for_resource``) to register in the database the order"}],"source_content_type":"text/x-rst","patch_set":6,"id":"9f436f4f_aa204cad","line":126,"in_reply_to":"9f436f4f_3033b8f8","updated":"2017-08-15 08:48:51.000000000","message":"Thanks Han!\n\nI will take a look at the links and update the spec accordingly.","commit_id":"e34e5acf27a86934bbb8ee47e59806c617099870"},{"author":{"_account_id":11816,"name":"mike_mp@zzzcomputing.com","display_name":"Mike Bayer","email":"mike_mp@zzzcomputing.com","username":"zzzeek","status":"Red Hat"},"change_message_id":"d0fe06e82aa63b35becac1116ad0f2eec3d99a63","unresolved":false,"context_lines":[{"line_number":208,"context_line":"resource_type    String    The type of the resource (e.g, Port, Router, ...)"},{"line_number":209,"context_line":"revision_number  Integer   The version of the object present in OVN"},{"line_number":210,"context_line":"\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d  \u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d  \u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d"},{"line_number":211,"context_line":""},{"line_number":212,"context_line":"Other notes"},{"line_number":213,"context_line":"-----------"},{"line_number":214,"context_line":""}],"source_content_type":"text/x-rst","patch_set":6,"id":"9f436f4f_8c535be2","line":211,"updated":"2017-08-15 15:05:44.000000000","message":"is this document proposing *both* solution 1 and solution 2  be deployed?","commit_id":"e34e5acf27a86934bbb8ee47e59806c617099870"},{"author":{"_account_id":6773,"name":"Lucas Alvares Gomes","email":"lucasagomes@gmail.com","username":"lucasagomes"},"change_message_id":"5843033f4002fe05ed52c6e4eff276041c94f785","unresolved":false,"context_lines":[{"line_number":208,"context_line":"resource_type    String    The type of the resource (e.g, Port, Router, ...)"},{"line_number":209,"context_line":"revision_number  Integer   The version of the object present in OVN"},{"line_number":210,"context_line":"\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d  \u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d  \u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d"},{"line_number":211,"context_line":""},{"line_number":212,"context_line":"Other notes"},{"line_number":213,"context_line":"-----------"},{"line_number":214,"context_line":""}],"source_content_type":"text/x-rst","patch_set":6,"id":"9f436f4f_eb399af1","line":211,"in_reply_to":"9f436f4f_8c535be2","updated":"2017-08-16 08:30:48.000000000","message":"Yes, what I tried to do in the spec was to show that there are two situations where the OVN and Neutron database can get out of sync. And solution 1 and 2 targets each one of the cases respectively.\n\nBut we need both to be deployed to solve the problem.\n\n(Maybe it\u0027s a bit confusing to separate it in two problems and two solutions... I thought it would be easier to see that the same problem could happen in more than one situation).","commit_id":"e34e5acf27a86934bbb8ee47e59806c617099870"},{"author":{"_account_id":11816,"name":"mike_mp@zzzcomputing.com","display_name":"Mike Bayer","email":"mike_mp@zzzcomputing.com","username":"zzzeek","status":"Red Hat"},"change_message_id":"d0fe06e82aa63b35becac1116ad0f2eec3d99a63","unresolved":false,"context_lines":[{"line_number":217,"context_line":"OVN database and even if the ``postcommit`` method is invoked in the wrong"},{"line_number":218,"context_line":"order we would still be able to to compare the ``revision_number``\u0027s and"},{"line_number":219,"context_line":"discard older updates to the same resource. That is true, but, the model"},{"line_number":220,"context_line":"in which we currently access the ovsdb-server information introduces"},{"line_number":221,"context_line":"yet another database level. Each client (IDL) that communicates with"},{"line_number":222,"context_line":"the ovsdb-server also maintains an in-memory replica of the OVN database"},{"line_number":223,"context_line":"and introduces a window where different clients could have `inconsistent"}],"source_content_type":"text/x-rst","patch_set":6,"id":"9f436f4f_8c6c7b9c","line":220,"updated":"2017-08-15 15:05:44.000000000","message":"this sentence is unclear to me, do you mean, \"based on the current data access model for ovsdb-server, this approach would introduce yet another database level\" ?","commit_id":"e34e5acf27a86934bbb8ee47e59806c617099870"},{"author":{"_account_id":6773,"name":"Lucas Alvares Gomes","email":"lucasagomes@gmail.com","username":"lucasagomes"},"change_message_id":"5843033f4002fe05ed52c6e4eff276041c94f785","unresolved":false,"context_lines":[{"line_number":217,"context_line":"OVN database and even if the ``postcommit`` method is invoked in the wrong"},{"line_number":218,"context_line":"order we would still be able to to compare the ``revision_number``\u0027s and"},{"line_number":219,"context_line":"discard older updates to the same resource. That is true, but, the model"},{"line_number":220,"context_line":"in which we currently access the ovsdb-server information introduces"},{"line_number":221,"context_line":"yet another database level. Each client (IDL) that communicates with"},{"line_number":222,"context_line":"the ovsdb-server also maintains an in-memory replica of the OVN database"},{"line_number":223,"context_line":"and introduces a window where different clients could have `inconsistent"}],"source_content_type":"text/x-rst","patch_set":6,"id":"9f436f4f_c6580d32","line":220,"in_reply_to":"9f436f4f_8c6c7b9c","updated":"2017-08-16 08:30:48.000000000","message":"That\u0027s right.\n\nSo, the \"api\" to OVN is a database (northbound db), but the problem that it introduces is that OVN clients in reality talks to a in-memory replica for this database. Each client will have it\u0027s own replica and they can for a window of time be out of sync.\n\nSo, this spec is proposing using the Neutron database for the locks instead to not have to deal with yet another database sync problem.","commit_id":"e34e5acf27a86934bbb8ee47e59806c617099870"},{"author":{"_account_id":23811,"name":"Oliver Walsh","email":"owalsh@redhat.com","username":"owalsh"},"change_message_id":"3e42363c09229e92d7ad0684d1db1137a08c4745","unresolved":false,"context_lines":[{"line_number":273,"context_line":"  threads could come to a halt (or really slowed down) while the"},{"line_number":274,"context_line":"  API workers were handling a lot of requests. This resulted in some"},{"line_number":275,"context_line":"  operations taking more than a minute to be processed. This behaviour"},{"line_number":276,"context_line":"  can be seem `in this screenshot \u003chttp://i.imgur.com/GDG8Mic.png\u003e`_."},{"line_number":277,"context_line":""},{"line_number":278,"context_line":".. TODO find a better place to host that image"},{"line_number":279,"context_line":""}],"source_content_type":"text/x-rst","patch_set":6,"id":"9f436f4f_c221a2fd","line":276,"range":{"start_line":276,"start_character":9,"end_line":276,"end_character":13},"updated":"2017-08-10 13:04:36.000000000","message":"nit: seen","commit_id":"e34e5acf27a86934bbb8ee47e59806c617099870"},{"author":{"_account_id":6773,"name":"Lucas Alvares Gomes","email":"lucasagomes@gmail.com","username":"lucasagomes"},"change_message_id":"5843033f4002fe05ed52c6e4eff276041c94f785","unresolved":false,"context_lines":[{"line_number":273,"context_line":"  threads could come to a halt (or really slowed down) while the"},{"line_number":274,"context_line":"  API workers were handling a lot of requests. This resulted in some"},{"line_number":275,"context_line":"  operations taking more than a minute to be processed. This behaviour"},{"line_number":276,"context_line":"  can be seem `in this screenshot \u003chttp://i.imgur.com/GDG8Mic.png\u003e`_."},{"line_number":277,"context_line":""},{"line_number":278,"context_line":".. TODO find a better place to host that image"},{"line_number":279,"context_line":""}],"source_content_type":"text/x-rst","patch_set":6,"id":"9f436f4f_c6712daa","line":276,"range":{"start_line":276,"start_character":9,"end_line":276,"end_character":13},"in_reply_to":"9f436f4f_c221a2fd","updated":"2017-08-16 08:30:48.000000000","message":"Done","commit_id":"e34e5acf27a86934bbb8ee47e59806c617099870"},{"author":{"_account_id":8788,"name":"Miguel Angel Ajo","email":"mangelajo@redhat.com","username":"mangelajo"},"change_message_id":"a37f5992bbb34bae0fff7ea5c7565bd5f6bf0d1e","unresolved":false,"context_lines":[{"line_number":108,"context_line":"lock_uuid      String    Unique identifier for the lock session"},{"line_number":109,"context_line":"acquired_at    DateTime  The time that the lock was acquired. Also used"},{"line_number":110,"context_line":"                         for the TTL"},{"line_number":111,"context_line":"released_at    DateTime  The time that the lock was released"},{"line_number":112,"context_line":"\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d  \u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d  \u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d"},{"line_number":113,"context_line":""},{"line_number":114,"context_line":"The pseudo-code will look something like this:"}],"source_content_type":"text/x-rst","patch_set":7,"id":"5f4e5783_dc246c60","line":111,"updated":"2017-10-17 09:08:32.000000000","message":"shall we may be delete the lock once released instead?\n\nmy worry is that we will start accumulating old lock entries in the database making it increasingly slow.","commit_id":"38c03ea3495d3c88a64423a99a00b3537ea89328"},{"author":{"_account_id":6773,"name":"Lucas Alvares Gomes","email":"lucasagomes@gmail.com","username":"lucasagomes"},"change_message_id":"33588933677f0056052ac0b63f1c7e142f8ef64a","unresolved":false,"context_lines":[{"line_number":108,"context_line":"lock_uuid      String    Unique identifier for the lock session"},{"line_number":109,"context_line":"acquired_at    DateTime  The time that the lock was acquired. Also used"},{"line_number":110,"context_line":"                         for the TTL"},{"line_number":111,"context_line":"released_at    DateTime  The time that the lock was released"},{"line_number":112,"context_line":"\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d  \u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d  \u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d"},{"line_number":113,"context_line":""},{"line_number":114,"context_line":"The pseudo-code will look something like this:"}],"source_content_type":"text/x-rst","patch_set":7,"id":"5f4e5783_1cb7a488","line":111,"in_reply_to":"5f4e5783_dc246c60","updated":"2017-10-17 09:18:44.000000000","message":"Oh perhaps I should have mentioned it in the spec. I will put a new patch-set up.\n\nYou are totally correct, we definitely need to clean up the logs in the database eventually. In the PoC code version of this spec, there\u0027s a routine that will get rid of the old locks, see [0]. It\u0027s right now set to 15 minutes, but, we could make it configurable if people wants to.\n\nI would still retain the locks for _some time_ before deleting it for troubleshooting purposes, it really does help.\n\n[0] https://review.openstack.org/#/c/503705/3/networking_ovn/common/maintenance.py@88","commit_id":"38c03ea3495d3c88a64423a99a00b3537ea89328"},{"author":{"_account_id":5756,"name":"Terry Wilson","email":"twilson@redhat.com","username":"otherwiseguy"},"change_message_id":"8d9b2bc93ea50464928aa5e79dac8c16ab0a0032","unresolved":false,"context_lines":[{"line_number":85,"context_line":"When a request to *update*, *create* or *delete* a certain resource"},{"line_number":86,"context_line":"arrives, the Neutron API worker that is handling this task should announce"},{"line_number":87,"context_line":"its *intent* to do it by creating a lock record based on that resource"},{"line_number":88,"context_line":"``UUID`` on a separated table in the Neutron database. While these locks"},{"line_number":89,"context_line":"are held, any other worker that attempts to modify the same resource"},{"line_number":90,"context_line":"will have to wait."},{"line_number":91,"context_line":""}],"source_content_type":"text/x-rst","patch_set":8,"id":"3f4b6375_2471e7e1","line":88,"updated":"2017-10-18 13:20:14.000000000","message":"I haven\u0027t been thinking about this very long and I\u0027m a bit late to the party, so this might have been asked before, but since there is a 1:1 relationship between locks and resources and this seems like a global issue, what is the benefit of having a separate lock table instead of adding lock-related fields to all objects (or at least ones that we determine need locks)?\n\nWith respect to Han Zhou\u0027s comments below re: dependencies, would it make sense for resources with dependencies to be able to define a lock() method that control what lock they actually should acquire? That way you could just lock the highest-level object that makes sense to lock? Say you wanted to lock a Network when updating a Subnet for some reason, you\u0027d just have lock() grab the Network lock, etc.?","commit_id":"a04aa9954122a8a2183465978d687b6dc6757589"},{"author":{"_account_id":6773,"name":"Lucas Alvares Gomes","email":"lucasagomes@gmail.com","username":"lucasagomes"},"change_message_id":"ae95c72f3e0232c45a708723b6312e0537eb6da2","unresolved":false,"context_lines":[{"line_number":85,"context_line":"When a request to *update*, *create* or *delete* a certain resource"},{"line_number":86,"context_line":"arrives, the Neutron API worker that is handling this task should announce"},{"line_number":87,"context_line":"its *intent* to do it by creating a lock record based on that resource"},{"line_number":88,"context_line":"``UUID`` on a separated table in the Neutron database. While these locks"},{"line_number":89,"context_line":"are held, any other worker that attempts to modify the same resource"},{"line_number":90,"context_line":"will have to wait."},{"line_number":91,"context_line":""}],"source_content_type":"text/x-rst","patch_set":8,"id":"3f4b6375_1f94f443","line":88,"in_reply_to":"3f4b6375_2471e7e1","updated":"2017-10-18 13:36:06.000000000","message":"Hi Terry,\n\nIt\u0027s never late to the party :D\n\nRegarding the first paragraph, yes, the synchronization problem seems to be common among ML2 drivers. The problem is that right now each one might implement different solutions for it, for example ODL uses the journaling approach which doesn\u0027t depend on locks. So, I don\u0027t know, at least for now while there\u0027s no agreement on the \"right/best\" way to solve this problem we better keep the OVN implementation of things separated from the existing tables in Neutron.\n\n2nd paragraph, that\u0027s similar with what is being proposed here (if I didn\u0027t read you wrong). Cause, in order to acquire a lock for a certain resource you only need its ID and type. So, in the subnet update you would know the UUID of the network it belongs to, you could just that and grab a lock for that network there if needed.\n\nHan Zhou\u0027s comment at L134 is regarding indirect dependencies, basically the way we have addresses sets in OVN which every port shares. I think that only locking it wouldn\u0027t solve the problem see: https://bugs.launchpad.net/networking-ovn/+bug/1611852","commit_id":"a04aa9954122a8a2183465978d687b6dc6757589"},{"author":{"_account_id":11762,"name":"Han Zhou","email":"zhouhan@gmail.com","username":"hanzhou"},"change_message_id":"509ccab1930bf28f36084f58b0f308d073202c17","unresolved":false,"context_lines":[{"line_number":117,"context_line":""},{"line_number":118,"context_line":"  In the ML2 driver:"},{"line_number":119,"context_line":""},{"line_number":120,"context_line":"    def update_port_postcommit():"},{"line_number":121,"context_line":"        with acquire_lock(port_id, RESOURCE_TYPE_PORT):"},{"line_number":122,"context_line":"            update_port_in_ovn(port)"},{"line_number":123,"context_line":""},{"line_number":124,"context_line":".. note::"},{"line_number":125,"context_line":"  This lock mechanism is quite similar to the `task queue"}],"source_content_type":"text/x-rst","patch_set":8,"id":"5f4e5783_cab92c79","line":122,"range":{"start_line":120,"start_character":0,"end_line":122,"end_character":36},"updated":"2017-10-17 18:24:24.000000000","message":"Hi Lucas, I see you changed the idea of registering lock in precommit. I think you might make this decision because the ordering can be preserved by the revision numbers in \u003csolution2\u003e. However, acquire_lock() here in postcommit() seems to be useless without registering the lock in precommit(). Could you explain your thoughts a little bit?","commit_id":"a04aa9954122a8a2183465978d687b6dc6757589"},{"author":{"_account_id":6773,"name":"Lucas Alvares Gomes","email":"lucasagomes@gmail.com","username":"lucasagomes"},"change_message_id":"04615f575ef2463fb7223942ecafc0309bebc628","unresolved":false,"context_lines":[{"line_number":117,"context_line":""},{"line_number":118,"context_line":"  In the ML2 driver:"},{"line_number":119,"context_line":""},{"line_number":120,"context_line":"    def update_port_postcommit():"},{"line_number":121,"context_line":"        with acquire_lock(port_id, RESOURCE_TYPE_PORT):"},{"line_number":122,"context_line":"            update_port_in_ovn(port)"},{"line_number":123,"context_line":""},{"line_number":124,"context_line":".. note::"},{"line_number":125,"context_line":"  This lock mechanism is quite similar to the `task queue"}],"source_content_type":"text/x-rst","patch_set":8,"id":"1f485f77_6e00a768","line":122,"range":{"start_line":120,"start_character":0,"end_line":122,"end_character":36},"in_reply_to":"1f485f77_05dd9b41","updated":"2017-11-13 15:16:31.000000000","message":"Btw Han, if we think that it\u0027s getting too complex (and it kinda sounds like that) we can go back to the task queue approach as well. That way we can serialize the requests to not rely on revision numbers (or just use rev numbers to optimize some aspects when detecting objects out of sync).","commit_id":"a04aa9954122a8a2183465978d687b6dc6757589"},{"author":{"_account_id":6773,"name":"Lucas Alvares Gomes","email":"lucasagomes@gmail.com","username":"lucasagomes"},"change_message_id":"c8f8ee07ff350f981d032eb355732742f16830c6","unresolved":false,"context_lines":[{"line_number":117,"context_line":""},{"line_number":118,"context_line":"  In the ML2 driver:"},{"line_number":119,"context_line":""},{"line_number":120,"context_line":"    def update_port_postcommit():"},{"line_number":121,"context_line":"        with acquire_lock(port_id, RESOURCE_TYPE_PORT):"},{"line_number":122,"context_line":"            update_port_in_ovn(port)"},{"line_number":123,"context_line":""},{"line_number":124,"context_line":".. note::"},{"line_number":125,"context_line":"  This lock mechanism is quite similar to the `task queue"}],"source_content_type":"text/x-rst","patch_set":8,"id":"1f485f77_36e2c6a1","line":122,"range":{"start_line":120,"start_character":0,"end_line":122,"end_character":36},"in_reply_to":"1f485f77_071e886f","updated":"2017-11-15 13:41:55.000000000","message":"Hi Han,\n\nYeah I think you are correct here, in case there\u0027s a conflict the transaction is going to be retried and the revision numbers comparison will be executed again with what is latest in OVNDB (this is due to the verify(\u0027external_ids\u0027) + the comparision happening in a OVS command). So it should be safe.\n\nIt sounds like we have a workable plan here ? Just to summarize the steps, we will need:\n\n1. Consolidate the commands into one transaction for the resources that doesn\u0027t do that yet (e.g [0])\n\n2. On the relevant update_*() methods, use the information in OVNDB instead of the \"original\" parameter to calculate the deltas (e.g [1]).\n\n3. Create a new OVS command that (e.g [2]):\n\n 3a. add a verify(\u0027internal_ids\u0027) to retry the transaction in case the external_ids field changes while it\u0027s being executed.\n\n 3b. Compares the revision number from Neutron with what is latest in OVNDB (it will fetch the object from the DB within the command). If the revision number in OVNDB is higher than the one in the neutron object it will abort the transaction.\n\n 3c. Update the revision number in OVNDB internal_ids if all went well.\n\n4. Create the mechanisms (periodic task) to detect when things are out of sync and correct them. (Note that the point 2. is also important for this because that will allow this mechanism to sync the object by just passing what is latest in Neutron to those methods without the need of an \"original\" object that we won\u0027t have at that point).\n\n[0] https://review.openstack.org/#/c/515673/\n[1] https://review.openstack.org/#/c/518757/\n[2] https://review.openstack.org/#/c/517049/4/networking_ovn/ovsdb/commands.py@1018\n\nDoes it summarize the steps well ?\n\nIn the mean time I will work on the PoC for testing this using the ports objects as a referece since it\u0027s the most complex one.","commit_id":"a04aa9954122a8a2183465978d687b6dc6757589"},{"author":{"_account_id":6773,"name":"Lucas Alvares Gomes","email":"lucasagomes@gmail.com","username":"lucasagomes"},"change_message_id":"0e7ccf40b001b6312f1b8afac6e54e5962270c75","unresolved":false,"context_lines":[{"line_number":117,"context_line":""},{"line_number":118,"context_line":"  In the ML2 driver:"},{"line_number":119,"context_line":""},{"line_number":120,"context_line":"    def update_port_postcommit():"},{"line_number":121,"context_line":"        with acquire_lock(port_id, RESOURCE_TYPE_PORT):"},{"line_number":122,"context_line":"            update_port_in_ovn(port)"},{"line_number":123,"context_line":""},{"line_number":124,"context_line":".. note::"},{"line_number":125,"context_line":"  This lock mechanism is quite similar to the `task queue"}],"source_content_type":"text/x-rst","patch_set":8,"id":"1f485f77_442e6792","line":122,"range":{"start_line":120,"start_character":0,"end_line":122,"end_character":36},"in_reply_to":"1f485f77_36e2c6a1","updated":"2017-11-17 14:00:51.000000000","message":"Forgot to place it here yesterday, but here\u0027s the patch doing the above for ports: https://review.openstack.org/#/c/520631/","commit_id":"a04aa9954122a8a2183465978d687b6dc6757589"},{"author":{"_account_id":11762,"name":"Han Zhou","email":"zhouhan@gmail.com","username":"hanzhou"},"change_message_id":"a15a5501b687a16412ec8170c59b0bcacd5c2708","unresolved":false,"context_lines":[{"line_number":117,"context_line":""},{"line_number":118,"context_line":"  In the ML2 driver:"},{"line_number":119,"context_line":""},{"line_number":120,"context_line":"    def update_port_postcommit():"},{"line_number":121,"context_line":"        with acquire_lock(port_id, RESOURCE_TYPE_PORT):"},{"line_number":122,"context_line":"            update_port_in_ovn(port)"},{"line_number":123,"context_line":""},{"line_number":124,"context_line":".. note::"},{"line_number":125,"context_line":"  This lock mechanism is quite similar to the `task queue"}],"source_content_type":"text/x-rst","patch_set":8,"id":"1f485f77_b2b8f9e4","line":122,"range":{"start_line":120,"start_character":0,"end_line":122,"end_character":36},"in_reply_to":"1f485f77_36e2c6a1","updated":"2017-12-05 23:10:12.000000000","message":"Sounds good to me!","commit_id":"a04aa9954122a8a2183465978d687b6dc6757589"},{"author":{"_account_id":11762,"name":"Han Zhou","email":"zhouhan@gmail.com","username":"hanzhou"},"change_message_id":"e8fb6d183ebf61dcce79272a92b2e3ba094bfcd1","unresolved":false,"context_lines":[{"line_number":117,"context_line":""},{"line_number":118,"context_line":"  In the ML2 driver:"},{"line_number":119,"context_line":""},{"line_number":120,"context_line":"    def update_port_postcommit():"},{"line_number":121,"context_line":"        with acquire_lock(port_id, RESOURCE_TYPE_PORT):"},{"line_number":122,"context_line":"            update_port_in_ovn(port)"},{"line_number":123,"context_line":""},{"line_number":124,"context_line":".. note::"},{"line_number":125,"context_line":"  This lock mechanism is quite similar to the `task queue"}],"source_content_type":"text/x-rst","patch_set":8,"id":"1f485f77_071e886f","line":122,"range":{"start_line":120,"start_character":0,"end_line":122,"end_character":36},"in_reply_to":"1f485f77_6e00a768","updated":"2017-11-15 01:51:08.000000000","message":"Hi Lucas, in case of revision number conflict, the transaction shall be aborted and the caller should re-read from Neutron DB and re-execute the compare and the OVN transaction. If there is no revision number conflict, it means there were no updates related to the port, right? So if comparing with OVN DB, I didn\u0027t thought of any case that has problem yet, but maybe there are some corner cases I didn\u0027t cover?\n\nIf there is such case that requires us to put the comparison in special places, for your suggestion 1), it seems a little bit complex to put comparison logic to the OVS commands. For 2), I don\u0027t think a lock in post-commit would help, because with or without lock, the both cases I mentioned in the review for [0] can still happen, depending on which worker gets the lock first.\n\nRegarding going back to the task queue approach, yes I think it is still an option (if not too complex).","commit_id":"a04aa9954122a8a2183465978d687b6dc6757589"},{"author":{"_account_id":6773,"name":"Lucas Alvares Gomes","email":"lucasagomes@gmail.com","username":"lucasagomes"},"change_message_id":"20591e16cc6dd77a70d00bb139ceb5226b5854da","unresolved":false,"context_lines":[{"line_number":117,"context_line":""},{"line_number":118,"context_line":"  In the ML2 driver:"},{"line_number":119,"context_line":""},{"line_number":120,"context_line":"    def update_port_postcommit():"},{"line_number":121,"context_line":"        with acquire_lock(port_id, RESOURCE_TYPE_PORT):"},{"line_number":122,"context_line":"            update_port_in_ovn(port)"},{"line_number":123,"context_line":""},{"line_number":124,"context_line":".. note::"},{"line_number":125,"context_line":"  This lock mechanism is quite similar to the `task queue"}],"source_content_type":"text/x-rst","patch_set":8,"id":"1f485f77_8ff42b5d","line":122,"range":{"start_line":120,"start_character":0,"end_line":122,"end_character":36},"in_reply_to":"1f485f77_82bd2f6a","updated":"2017-11-10 10:09:05.000000000","message":"Cool, in the meantime I\u0027ve been experimenting some other things. For example, I believe we could continue with the delta approach for performance but instead of creating the delta only comparing the data in the Neutron database (current vs original objects) we can replace the original one with the data in the OVNDB.\n\nFor ports, I had to include a new field for the security group ids that the port is belongs to the Logical_Switch_Port external_ids table, but it should be fine. The PoC patch is this one: https://review.openstack.org/#/c/518757/\n\nIgnore the unit and functional tests cause I haven\u0027t changed those (at the time I wrote this message) but, tempest tests seems happy with it. I\u0027m doing some more tests locally as well comparing the data to make sure I got it right.\n\nIf you can take a look at it as well it would be great.\n\nThanks Han!","commit_id":"a04aa9954122a8a2183465978d687b6dc6757589"},{"author":{"_account_id":6773,"name":"Lucas Alvares Gomes","email":"lucasagomes@gmail.com","username":"lucasagomes"},"change_message_id":"1bae38998a1aa8244421688415306c2c78e60586","unresolved":false,"context_lines":[{"line_number":117,"context_line":""},{"line_number":118,"context_line":"  In the ML2 driver:"},{"line_number":119,"context_line":""},{"line_number":120,"context_line":"    def update_port_postcommit():"},{"line_number":121,"context_line":"        with acquire_lock(port_id, RESOURCE_TYPE_PORT):"},{"line_number":122,"context_line":"            update_port_in_ovn(port)"},{"line_number":123,"context_line":""},{"line_number":124,"context_line":".. note::"},{"line_number":125,"context_line":"  This lock mechanism is quite similar to the `task queue"}],"source_content_type":"text/x-rst","patch_set":8,"id":"1f485f77_05dd9b41","line":122,"range":{"start_line":120,"start_character":0,"end_line":122,"end_character":36},"in_reply_to":"1f485f77_8ff42b5d","updated":"2017-11-13 10:06:56.000000000","message":"Hi Han, all\n\nThanks for the review at [0]... I will bring the conversation back to this thread so it doesn\u0027t get scattered around many places.\n\nSo yeah, creating the delta based on the latest information in neutron against the data in the OVNDB seems feasible and doesn\u0027t required much changes [0]. Now we need to be able to create the delta in a place where we know the object won\u0027t be changed by other workers OR in a place where it gets regenerated in case the data in OVNDB is changed.\n\nI only see two possible places:\n\n1) Inside the OVS command. So that if the revision number is changed the transaction will be retried and the delta re-generated. So the transaction itself is what guarantee the correctness of the delta\n\n2) External lock in the postcommit() like this current spec was proposing.\n\nDo you see any other alternative ?\n\n[0] https://review.openstack.org/#/c/518757/\n\nThanks","commit_id":"a04aa9954122a8a2183465978d687b6dc6757589"},{"author":{"_account_id":6773,"name":"Lucas Alvares Gomes","email":"lucasagomes@gmail.com","username":"lucasagomes"},"change_message_id":"1fd6c357ae1f9257c259599f55dc9ae73f73c294","unresolved":false,"context_lines":[{"line_number":117,"context_line":""},{"line_number":118,"context_line":"  In the ML2 driver:"},{"line_number":119,"context_line":""},{"line_number":120,"context_line":"    def update_port_postcommit():"},{"line_number":121,"context_line":"        with acquire_lock(port_id, RESOURCE_TYPE_PORT):"},{"line_number":122,"context_line":"            update_port_in_ovn(port)"},{"line_number":123,"context_line":""},{"line_number":124,"context_line":".. note::"},{"line_number":125,"context_line":"  This lock mechanism is quite similar to the `task queue"}],"source_content_type":"text/x-rst","patch_set":8,"id":"3f4b6375_a7bcc406","line":122,"range":{"start_line":120,"start_character":0,"end_line":122,"end_character":36},"in_reply_to":"3f4b6375_013dd2b9","updated":"2017-11-08 10:20:04.000000000","message":"\u003e For 3rd option, what I meant is: leave as is for current implementation, i.e. allowing inconsistent status for some period of time. Always rely on periodical sync to detect and correct. And the purpose of revision number is only for optimizing the periodical sync, so that it can be more efficient thus we can run it more frequently.\n\u003e Is this what you mean, too?\n\nHmm maybe ? My idea was to implement some of the things we have discussed in this thread. Here:\n\n0) Consolidating the transactions into one for methods that uses multiple transactions to change an object (e.g the update_router() we talked above).\n\n1) Saving the revision_number from neutron in OVNNB within that transaction.\n\n2) Creating the \"ovn_revision_numbers\" table in the Neutron database to optimize the detection of resources that are out of sync or that failed to be created or failed to be deleted.\n\n3) Having a periodic task to detect and correct the out of sync resources.\n\nThe main drop would be the ovs command that aborts the transaction here. Although we could still use it for certain resources that does not update those conflicting fields (e.g when updating networks)\n\nDoes it sound like a plan ?","commit_id":"a04aa9954122a8a2183465978d687b6dc6757589"},{"author":{"_account_id":6773,"name":"Lucas Alvares Gomes","email":"lucasagomes@gmail.com","username":"lucasagomes"},"change_message_id":"074ee6aadc41f8c98dbf75b41019f8126fa5fe35","unresolved":false,"context_lines":[{"line_number":117,"context_line":""},{"line_number":118,"context_line":"  In the ML2 driver:"},{"line_number":119,"context_line":""},{"line_number":120,"context_line":"    def update_port_postcommit():"},{"line_number":121,"context_line":"        with acquire_lock(port_id, RESOURCE_TYPE_PORT):"},{"line_number":122,"context_line":"            update_port_in_ovn(port)"},{"line_number":123,"context_line":""},{"line_number":124,"context_line":".. note::"},{"line_number":125,"context_line":"  This lock mechanism is quite similar to the `task queue"}],"source_content_type":"text/x-rst","patch_set":8,"id":"3f4b6375_96d091cb","line":122,"range":{"start_line":120,"start_character":0,"end_line":122,"end_character":36},"in_reply_to":"3f4b6375_1338763b","updated":"2017-10-31 12:00:37.000000000","message":"Hi Han, Terry\n\nSorry for the delay on the reply here, I was off work yesterday (national holiday).\n\nSo yeah good points there.\n\n\u003e We can implement a new IDL command just for revision number processing:\n\nI actually really like this idea. I think @terry proposed something similar when I was talking to him on IRC.\n\n\u003e With this, the initiator of the transaction needs to try-catch the BadRevision exception and ignore or retry, depends on the scenario. (Now I think it is better to always raise exception upon revision conflict and let the invoker decide how to deal with it).\n\nRight, one downside of raising the exception at the moment is that ovsdbapp will log any exception as an ERROR. But, we could possibly extend ovsdbapp and create a special exception type which will abort the transaction at [0].\n\nAlso, I\u0027ve started looking at consolidating multiple transactions into a single one for this to work, first patch is here: https://review.openstack.org/#/c/515673/\n\nAnyway, it does sound like we have plan.\n\n[0] https://github.com/openstack/ovsdbapp/blob/a87276864ee369ed5d04c333066791cd6fd79fa7/ovsdbapp/backend/ovs_idl/transaction.py#L87-L90","commit_id":"a04aa9954122a8a2183465978d687b6dc6757589"},{"author":{"_account_id":6773,"name":"Lucas Alvares Gomes","email":"lucasagomes@gmail.com","username":"lucasagomes"},"change_message_id":"7a3c4e8f16b69e4afb895985de98e1241a3e94c9","unresolved":false,"context_lines":[{"line_number":117,"context_line":""},{"line_number":118,"context_line":"  In the ML2 driver:"},{"line_number":119,"context_line":""},{"line_number":120,"context_line":"    def update_port_postcommit():"},{"line_number":121,"context_line":"        with acquire_lock(port_id, RESOURCE_TYPE_PORT):"},{"line_number":122,"context_line":"            update_port_in_ovn(port)"},{"line_number":123,"context_line":""},{"line_number":124,"context_line":".. note::"},{"line_number":125,"context_line":"  This lock mechanism is quite similar to the `task queue"}],"source_content_type":"text/x-rst","patch_set":8,"id":"3f4b6375_e9300aeb","line":122,"range":{"start_line":120,"start_character":0,"end_line":122,"end_character":36},"in_reply_to":"3f4b6375_2452d914","updated":"2017-10-20 12:29:22.000000000","message":"Hi Han,\n\nThanks, that\u0027s some really good information! You are right about the difference between the objects dependencies there. I can for sure makes some experiments with it and see if it would solve the problem for us. Thanks again.\n\nI\u0027m still a tad concerned about having no locks, some methods can be quite long and do way more things than just updating/creating/deleting the resource in the OVNNB database. Some uses more than one transaction when creating/updating the resources. For example this method here is called as part of the update_subnet() [0], similarly this for update_router() [1]. Having no atomicity to protect all these [in]dependent operations sounds a bit problematic.\n\nWhat you think ?\n\nAlso, other reviewers. Do you guys/gals have opinion an opinion about it ?\n\n[0] https://github.com/openstack/networking-ovn/blob/3abbc1546054a5d159e166a857413aefe0c655cb/networking_ovn/common/ovn_client.py#L935\n\n[1] https://github.com/openstack/networking-ovn/blob/3abbc1546054a5d159e166a857413aefe0c655cb/networking_ovn/common/ovn_client.py#L554","commit_id":"a04aa9954122a8a2183465978d687b6dc6757589"},{"author":{"_account_id":6773,"name":"Lucas Alvares Gomes","email":"lucasagomes@gmail.com","username":"lucasagomes"},"change_message_id":"326e117bcf19c972d2be4b6ac6f2e130eb89f081","unresolved":false,"context_lines":[{"line_number":117,"context_line":""},{"line_number":118,"context_line":"  In the ML2 driver:"},{"line_number":119,"context_line":""},{"line_number":120,"context_line":"    def update_port_postcommit():"},{"line_number":121,"context_line":"        with acquire_lock(port_id, RESOURCE_TYPE_PORT):"},{"line_number":122,"context_line":"            update_port_in_ovn(port)"},{"line_number":123,"context_line":""},{"line_number":124,"context_line":".. note::"},{"line_number":125,"context_line":"  This lock mechanism is quite similar to the `task queue"}],"source_content_type":"text/x-rst","patch_set":8,"id":"3f4b6375_8c1efdb8","line":122,"range":{"start_line":120,"start_character":0,"end_line":122,"end_character":36},"in_reply_to":"3f4b6375_25c73e15","updated":"2017-11-02 15:21:45.000000000","message":"\u003e \u003e Right, one downside of raising the exception at the moment is\n \u003e that ovsdbapp will log any exception as an ERROR. But, we could\n \u003e possibly extend ovsdbapp and create a special exception type which\n \u003e will abort the transaction at [0].\n \u003e \n \u003e One can also choose whether something logs an error by\n \u003e transaction/execute(..., log_errors\u003dFalse) on a case-by-case basis.\n\nYes and no, if we pass log_errors\u003dFalse it means no errors will be logged at all, right ?\n\nIn our case we still want to log the errors, but, when RevisionConflict() is raise we want to treat it as an special case and log it as an INFO/WARNING instead of an ERROR. Because that\u0027s an expected behavior.\n\nMakes sense ?","commit_id":"a04aa9954122a8a2183465978d687b6dc6757589"},{"author":{"_account_id":5756,"name":"Terry Wilson","email":"twilson@redhat.com","username":"otherwiseguy"},"change_message_id":"5b1617cac5ceb380da5a188467d15ab09ac90b2e","unresolved":false,"context_lines":[{"line_number":117,"context_line":""},{"line_number":118,"context_line":"  In the ML2 driver:"},{"line_number":119,"context_line":""},{"line_number":120,"context_line":"    def update_port_postcommit():"},{"line_number":121,"context_line":"        with acquire_lock(port_id, RESOURCE_TYPE_PORT):"},{"line_number":122,"context_line":"            update_port_in_ovn(port)"},{"line_number":123,"context_line":""},{"line_number":124,"context_line":".. note::"},{"line_number":125,"context_line":"  This lock mechanism is quite similar to the `task queue"}],"source_content_type":"text/x-rst","patch_set":8,"id":"3f4b6375_662cf501","line":122,"range":{"start_line":120,"start_character":0,"end_line":122,"end_character":36},"in_reply_to":"3f4b6375_26c95dab","updated":"2017-11-02 19:16:51.000000000","message":"\u003e An assumption here is that each update to an OVN object (lport, lswitch, lrouter, etc.) should contain all attributes of that object rather than a partial update. Otherwise, we could lose an older update for a different column. We should check the current IDL implementation to avoid partial updates. For certain types of objects that is supposed to be partially updated, such as address set, I think we cannot use versioning mechanism to ensure consistency (in fact there is no revision number to use in Neutron for address set).\n\nThis is my point. Any two partial updates could cause the second to fail--and in the PoC code, fail with just a warning. No exception raised and no retry. If everyone is sure that this case can\u0027t happen, then great. It\u0027s just something to be aware of.\n\nNone of the IDL code assumes in any way that objects are being updated as a whole, so it is incumbent upon developers of networking-ovn code to make sure that doesn\u0027t happen.","commit_id":"a04aa9954122a8a2183465978d687b6dc6757589"},{"author":{"_account_id":6773,"name":"Lucas Alvares Gomes","email":"lucasagomes@gmail.com","username":"lucasagomes"},"change_message_id":"326e117bcf19c972d2be4b6ac6f2e130eb89f081","unresolved":false,"context_lines":[{"line_number":117,"context_line":""},{"line_number":118,"context_line":"  In the ML2 driver:"},{"line_number":119,"context_line":""},{"line_number":120,"context_line":"    def update_port_postcommit():"},{"line_number":121,"context_line":"        with acquire_lock(port_id, RESOURCE_TYPE_PORT):"},{"line_number":122,"context_line":"            update_port_in_ovn(port)"},{"line_number":123,"context_line":""},{"line_number":124,"context_line":".. note::"},{"line_number":125,"context_line":"  This lock mechanism is quite similar to the `task queue"}],"source_content_type":"text/x-rst","patch_set":8,"id":"3f4b6375_8fc6ef67","line":122,"range":{"start_line":120,"start_character":0,"end_line":122,"end_character":36},"in_reply_to":"3f4b6375_299b7b18","updated":"2017-11-02 15:21:45.000000000","message":"\u003e With the proposed solution wouldn\u0027t we still have an issue where if\n \u003e two different columns were updated by different workers (or two\n \u003e keys in the same map column), the second one processed would fail\n \u003e even though it is perfectly valid? As-is, the second transaction\n \u003e would fail, but the exception would be caught and we\u0027d just log.\n \u003e I\u0027m not sure object revision numbers are sufficient to provide\n \u003e correct behavior.\n\nI don\u0027t see how, but I could be missing something. The way I understand it is:\n\nLet\u0027s say a port object in OVNDB is at V1.\n\nNow two updates come in: V2 and V3.\n\nAn important point here is that, both of those versions have already been committed to the Neutron database. So the object at V3 already includes all changes done by V2.\n\nTwo things can happen here:\n\n1. V3 gets committed to OVNDB before V2. The transaction related to the V2 update will be retried. Since now we already have V3 committed to OVNDB the update will be dropped at the version comparison because V3 \u003e V2.\n\n2. If V2 gets committed to OVNDB before V3. The transaction related to the V3 update will be retried. But, this time it won\u0027t be dropped because the version we have in OVNDB (V2) is lower than the update, so it will go thru.\n\nMakes sense ?\n\nAs a note, what causes the transactions to be retried is the verify(\u0027external_ids\u0027) call that we will include in the new command that does the version comparison.","commit_id":"a04aa9954122a8a2183465978d687b6dc6757589"},{"author":{"_account_id":23804,"name":"Daniel Alvarez","email":"dalvarez@redhat.com","username":"dalvarez"},"change_message_id":"83d17591c813524b9aa222a431620801991ac132","unresolved":false,"context_lines":[{"line_number":117,"context_line":""},{"line_number":118,"context_line":"  In the ML2 driver:"},{"line_number":119,"context_line":""},{"line_number":120,"context_line":"    def update_port_postcommit():"},{"line_number":121,"context_line":"        with acquire_lock(port_id, RESOURCE_TYPE_PORT):"},{"line_number":122,"context_line":"            update_port_in_ovn(port)"},{"line_number":123,"context_line":""},{"line_number":124,"context_line":".. note::"},{"line_number":125,"context_line":"  This lock mechanism is quite similar to the `task queue"}],"source_content_type":"text/x-rst","patch_set":8,"id":"3f4b6375_2937fb01","line":122,"range":{"start_line":120,"start_character":0,"end_line":122,"end_character":36},"in_reply_to":"3f4b6375_299b7b18","updated":"2017-11-02 14:35:39.000000000","message":"I was exactly thinking of what Terry mentions, I\u0027m not an expert but looks like this scenario would happen and we may drop valid updates. I can\u0027t clearly see how to get rid of the lock and still cover this.","commit_id":"a04aa9954122a8a2183465978d687b6dc6757589"},{"author":{"_account_id":11762,"name":"Han Zhou","email":"zhouhan@gmail.com","username":"hanzhou"},"change_message_id":"da7222f6096c8c186f2c45f245a0b4a7c4cfec31","unresolved":false,"context_lines":[{"line_number":117,"context_line":""},{"line_number":118,"context_line":"  In the ML2 driver:"},{"line_number":119,"context_line":""},{"line_number":120,"context_line":"    def update_port_postcommit():"},{"line_number":121,"context_line":"        with acquire_lock(port_id, RESOURCE_TYPE_PORT):"},{"line_number":122,"context_line":"            update_port_in_ovn(port)"},{"line_number":123,"context_line":""},{"line_number":124,"context_line":".. note::"},{"line_number":125,"context_line":"  This lock mechanism is quite similar to the `task queue"}],"source_content_type":"text/x-rst","patch_set":8,"id":"3f4b6375_013dd2b9","line":122,"range":{"start_line":120,"start_character":0,"end_line":122,"end_character":36},"in_reply_to":"3f4b6375_38612306","updated":"2017-11-07 19:04:27.000000000","message":"\u003e I\u0027m actually thinking about what you said, about having a 3rd practical option that uses revision numbers to alleviate the consistency (which does work for most of the resources) but still relying on a periodic sync mechanism to guarantee the consistency of those shared columns.\n\nFor 3rd option, what I meant is: leave as is for current implementation, i.e. allowing inconsistent status for some period of time. Always rely on periodical sync to detect and correct. And the purpose of revision number is only for optimizing the periodical sync, so that it can be more efficient thus we can run it more frequently.\n\nIs this what you mean, too?","commit_id":"a04aa9954122a8a2183465978d687b6dc6757589"},{"author":{"_account_id":6773,"name":"Lucas Alvares Gomes","email":"lucasagomes@gmail.com","username":"lucasagomes"},"change_message_id":"a8f1b7a36857bbe3d6397f74c6f705eb89b5951e","unresolved":false,"context_lines":[{"line_number":117,"context_line":""},{"line_number":118,"context_line":"  In the ML2 driver:"},{"line_number":119,"context_line":""},{"line_number":120,"context_line":"    def update_port_postcommit():"},{"line_number":121,"context_line":"        with acquire_lock(port_id, RESOURCE_TYPE_PORT):"},{"line_number":122,"context_line":"            update_port_in_ovn(port)"},{"line_number":123,"context_line":""},{"line_number":124,"context_line":".. note::"},{"line_number":125,"context_line":"  This lock mechanism is quite similar to the `task queue"}],"source_content_type":"text/x-rst","patch_set":8,"id":"3f4b6375_625456cb","line":122,"range":{"start_line":120,"start_character":0,"end_line":122,"end_character":36},"in_reply_to":"3f4b6375_484863cb","updated":"2017-10-19 08:58:40.000000000","message":"Hi Han,\n\nThat\u0027s true, but, I\u0027ve explicitly added the lock because I\u0027ve seem IDLs getting out of sync due to the in-memory replica cache mechanism of it. \n\nWith the journaling it was very apparent, sometimes we could see a journal thread complaining about a resource not existing in the OVSDB for few times even after it was created, then eventually (due to the retries) it succeeded. So there\u0027s definitely a window of time in which the in-replica caches for the IDLs are out of sync. \n\nHere\u0027s another patch showing it: https://review.openstack.org/#/c/404791\n\nI talked to @Terry about it at the time and he mentioned that we probably could get around it by using the \"wait\" operation from ovsdb [0]. But again, this is still not exposed in the ovsdbapp and all that.\n\nSo, the locking + having the revision numbers in the Neutron database comes to address this problem (we still will add the revision number to OVSDB external_ids, as a fallback).\n\nMaybe we can start like this and, if we find a better solution for not having a lock we can just get rid of it. What do you think ?\n\n[0] https://www.rfc-editor.org/rfc/rfc7047.txt (see 5.2.6.  Wait)","commit_id":"a04aa9954122a8a2183465978d687b6dc6757589"},{"author":{"_account_id":11762,"name":"Han Zhou","email":"zhouhan@gmail.com","username":"hanzhou"},"change_message_id":"17fe8ba21fc2690c9b4aaad85e39c34071d4a1a8","unresolved":false,"context_lines":[{"line_number":117,"context_line":""},{"line_number":118,"context_line":"  In the ML2 driver:"},{"line_number":119,"context_line":""},{"line_number":120,"context_line":"    def update_port_postcommit():"},{"line_number":121,"context_line":"        with acquire_lock(port_id, RESOURCE_TYPE_PORT):"},{"line_number":122,"context_line":"            update_port_in_ovn(port)"},{"line_number":123,"context_line":""},{"line_number":124,"context_line":".. note::"},{"line_number":125,"context_line":"  This lock mechanism is quite similar to the `task queue"}],"source_content_type":"text/x-rst","patch_set":8,"id":"3f4b6375_cc64984c","line":122,"range":{"start_line":120,"start_character":0,"end_line":122,"end_character":36},"in_reply_to":"3f4b6375_4910a54e","updated":"2017-10-27 02:33:28.000000000","message":"Thank you. I think it is a great discussion!\n\nRegarding your questions:\n\n1) how to abort a transaction - I think we don\u0027t have to abort the transaction, but simply don\u0027t do the update operation for the resource if the version in OVN is newer. (in the code sample you provided it would just return, so the command end up as a no-op but not as failure). However, I think it is also valid to raise exception which causes the whole transaction got aborted, then the initiator of the transaction can just catch such kind of Exception and ignore. Maybe you will see which way is better when implementing it.\n\n2) how to connect different transactions - I think if different transactions needs to be atomic to ensure consistency, then we should just change the code to put them under a single transaction, because that is what transaction is for :)","commit_id":"a04aa9954122a8a2183465978d687b6dc6757589"},{"author":{"_account_id":11762,"name":"Han Zhou","email":"zhouhan@gmail.com","username":"hanzhou"},"change_message_id":"a1ba4e18a7eb8b13d1e74635d1ca9faa6fae683f","unresolved":false,"context_lines":[{"line_number":117,"context_line":""},{"line_number":118,"context_line":"  In the ML2 driver:"},{"line_number":119,"context_line":""},{"line_number":120,"context_line":"    def update_port_postcommit():"},{"line_number":121,"context_line":"        with acquire_lock(port_id, RESOURCE_TYPE_PORT):"},{"line_number":122,"context_line":"            update_port_in_ovn(port)"},{"line_number":123,"context_line":""},{"line_number":124,"context_line":".. note::"},{"line_number":125,"context_line":"  This lock mechanism is quite similar to the `task queue"}],"source_content_type":"text/x-rst","patch_set":8,"id":"3f4b6375_1338763b","line":122,"range":{"start_line":120,"start_character":0,"end_line":122,"end_character":36},"in_reply_to":"3f4b6375_4d9e51b9","updated":"2017-10-27 18:59:42.000000000","message":"Terry, this is a very good point! I felt there is something not right but didn\u0027t figure out. Basically this means we can\u0027t have 2 updates on same object in same transaction, because the second update would always be no-op/fail because the revision number would be equal to the one in external-id which would have been incremented already by the first update.\n\nI think this is a drawback of a customized revision number mechanism versus revision number supported directly by DB backend. If ovsdb supports row level versioning, and automatically checks the version when processing the transaction, it would be much simpler.\n\nWithout build-in support of row level versioning in ovsdb, we need to have a wrapper to make it work properly. Here is what I think:\n\nWe can implement a new IDL command just for revision number processing:\n\n- check_revision(object, revision)\n\nIt will do following steps:\n\nCompare the revision from input with the one in external-id.\n\n- If the old revision number in external-id is equal or newer that the input, it will raise an BadRevision exception to abort the transaction.\n\n- Otherwise, it will add \"verify\" to external-id, and then set the new revision to external-id.\n\nThis command must be called once and only once for each object that could be updated in a transaction BEFORE the commands that update the object.\n\nWith this, the initiator of the transaction needs to try-catch the BadRevision exception and ignore or retry, depends on the scenario. (Now I think it is better to always raise exception upon revision conflict and let the invoker decide how to deal with it).\n\nWhat do you think?","commit_id":"a04aa9954122a8a2183465978d687b6dc6757589"},{"author":{"_account_id":11762,"name":"Han Zhou","email":"zhouhan@gmail.com","username":"hanzhou"},"change_message_id":"8841420634d6e13199d9cab43ee237c34a2b0546","unresolved":false,"context_lines":[{"line_number":117,"context_line":""},{"line_number":118,"context_line":"  In the ML2 driver:"},{"line_number":119,"context_line":""},{"line_number":120,"context_line":"    def update_port_postcommit():"},{"line_number":121,"context_line":"        with acquire_lock(port_id, RESOURCE_TYPE_PORT):"},{"line_number":122,"context_line":"            update_port_in_ovn(port)"},{"line_number":123,"context_line":""},{"line_number":124,"context_line":".. note::"},{"line_number":125,"context_line":"  This lock mechanism is quite similar to the `task queue"}],"source_content_type":"text/x-rst","patch_set":8,"id":"3f4b6375_fc584631","line":122,"range":{"start_line":120,"start_character":0,"end_line":122,"end_character":36},"in_reply_to":"3f4b6375_5e23e756","updated":"2017-10-25 22:56:13.000000000","message":"Lucas, thanks for clarifying in detail. You are right that I did misunderstand what revision number are you proposing. I thought a little more and I will try to clarify the mechanism from basic correctness to optimizations step by step. \n\nNow I think we are on the same page that the revision number for comparing should be generated in Neutron DB transaction (precommit), and compared and saved in OVN transaction to ensure the consistency. This is the basic part to ensure the correctness during race condition.\n\n\u003e Without a lock on that resource this could contain a race, no ? The time between the version comparison (V2 \u003e V0) and the update of the OVN port there\u0027s nothing that guarantees that a 3rd update will happen. And that would overwrite the changes of V3 (unless we use that verify() you mentioned and somehow there we compare V3 \u003c V2 and the transaction is aborted).\n\nThere should be no problem, because the version comparison and the update in OVN are happened in the same OVN transaction. If a 3rd update happens between the compare and commit, the OVN transaction will automatically retry (with the help of \"verify\" to detect the conflict) and the version comparison happening in retry will find out that the update is not needed any more, thus decide not to update OVN (this is different from a transaction abort).\n\nSo far, we don\u0027t need any extra lock. Right?\n\nNow let\u0027s address a second problem about failure in OVN update, which could result in inconsistency, i.e. revision number in Neutron is newer than in OVN. For this problem you propose a periodical check \u0026 sync, and we need an extra lock to ensure only one worker is doing this job. I think this lock is better to be global instead of per object, because we only need one such worker to be running globally.\n\nWhile doing the check \u0026 sync, if some object is found (or suspect to be) inconsistent, we can go ahead perform a compare \u0026 update operation just like the normal update operation: making sure the version compare and OVN update is in the same OVN transaction. This same mechanism would make sure there is no dirty update by the sync job.\n\nSo far, problems are solved, and the only extra lock needed is a global lock for check \u0026 sync job. (the lock can be either based on Neutron DB or ovsdb)\n\nNow let\u0027s discuss the optimization you proposed about storing the OVN version back in Neutron DB as a separate table. If I understand correctly, the purpose is NOT for the normal port update itself, but for efficiency of the periodical check \u0026 sync. I can\u0027t tell if this kind of optimization is really needed so far, considering that we already have sync workers running in similar ways (the difference is that it is comparing content rather than revision numbers). But I agree this is a very good idea if it is required! It works like a Neutron side cache of OVN side revision number. And the check operation in the \"check \u0026 sync\" is separate to 2 stages: a pre-check by JOIN the tables in Neutron to find all potentially inconsistent objects, and a real check per-oject only for the ones  suspect to be inconsistent (much less than the total number of objects). \n\nFor this optimization, I didn\u0027t see the need for a new lock.\n1. Saving the revision number back to Neutron should happen after the OVN transaction (we don\u0027t want to save it back before we are sure that OVN update is commited). \n2. The precheck is a global/batch read operation on all objects.\nAbove 2 steps are the extra steps introduced by the optimization, but they don\u0027t need any other locks, since this works just like a cache, and the real check will still happen in the OVN transaction when trying to do real update, so it is ok that the cached revision number is not 100% accurate. As you also pointed out, even with a lock there is still possibility that the cached revision number is not up to date. The only thing we need to make sure about the optimization is that it probably will find more inconsistent objects than there really are, but not less. With the steps above this is guaranteed.\n\nI think now we addressed both correctness and optimization. Please let me know if you think there is any other situation that a lock in postcommit would help.\n\nBTW, I would suggest that we ensure the correctness in the simplest way first, and then do the optimization incrementally.","commit_id":"a04aa9954122a8a2183465978d687b6dc6757589"},{"author":{"_account_id":11762,"name":"Han Zhou","email":"zhouhan@gmail.com","username":"hanzhou"},"change_message_id":"6ae4453cde18c2bb698cb79d7e3649467ea6a755","unresolved":false,"context_lines":[{"line_number":117,"context_line":""},{"line_number":118,"context_line":"  In the ML2 driver:"},{"line_number":119,"context_line":""},{"line_number":120,"context_line":"    def update_port_postcommit():"},{"line_number":121,"context_line":"        with acquire_lock(port_id, RESOURCE_TYPE_PORT):"},{"line_number":122,"context_line":"            update_port_in_ovn(port)"},{"line_number":123,"context_line":""},{"line_number":124,"context_line":".. note::"},{"line_number":125,"context_line":"  This lock mechanism is quite similar to the `task queue"}],"source_content_type":"text/x-rst","patch_set":8,"id":"3f4b6375_2452d914","line":122,"range":{"start_line":120,"start_character":0,"end_line":122,"end_character":36},"in_reply_to":"3f4b6375_625456cb","updated":"2017-10-19 20:57:58.000000000","message":"Hi Lucas, the problem you mentioned [1] is slightly different from the current one. [1] is about dependencies between objects, which has to be handled outside of ovsdb transaction. Here we are talking about updating a single object by multiple clients, which is handled properly by ovsdb IDL. The retry happens automatically in ovsdb_idl code [2], and the caller doesn\u0027t need to know anything. The only thing to ensure is that there is a \"verify\" operation included in the IDL transaction (see details in [3]), which is properly implemented in current networking-ovn ovsdb commands, e.g. [4]. To support the revision number mechanism, the commands such as SetLSwitchPortCommand need to add \"verify\" to external-ids accordingly.\n\n[1] https://review.openstack.org/#/c/404791\n[2] https://github.com/openstack/ovsdbapp/blob/master/ovsdbapp/backend/ovs_idl/transaction.py#L93-L97\n[3] https://github.com/openvswitch/ovs/blob/master/python/ovs/db/idl.py#L1014-L1055\n[4] https://github.com/openstack/networking-ovn/blob/master/networking_ovn/ovsdb/commands.py#L139","commit_id":"a04aa9954122a8a2183465978d687b6dc6757589"},{"author":{"_account_id":11762,"name":"Han Zhou","email":"zhouhan@gmail.com","username":"hanzhou"},"change_message_id":"70970f7d146f066f4a0a661e239a30ecbdf52b9a","unresolved":false,"context_lines":[{"line_number":117,"context_line":""},{"line_number":118,"context_line":"  In the ML2 driver:"},{"line_number":119,"context_line":""},{"line_number":120,"context_line":"    def update_port_postcommit():"},{"line_number":121,"context_line":"        with acquire_lock(port_id, RESOURCE_TYPE_PORT):"},{"line_number":122,"context_line":"            update_port_in_ovn(port)"},{"line_number":123,"context_line":""},{"line_number":124,"context_line":".. note::"},{"line_number":125,"context_line":"  This lock mechanism is quite similar to the `task queue"}],"source_content_type":"text/x-rst","patch_set":8,"id":"3f4b6375_a1b53734","line":122,"range":{"start_line":120,"start_character":0,"end_line":122,"end_character":36},"in_reply_to":"3f4b6375_662cf501","updated":"2017-11-02 19:32:57.000000000","message":"\u003e None of the IDL code assumes in any way that objects are being updated as a whole, so it is incumbent upon developers of networking-ovn code to make sure that doesn\u0027t happen.\n\nGood point. If the assumption cannot be ensured, the initiator of the transaction should NOT simply skip the change when ConflictRevision is raised from the transaction, but retry the change by: 1. re-read Neutron data 2. retry the transaction. This will add a burden in the code. I am not sure which one is simpler: ensure not partial update, or always be prepared to retry the change.","commit_id":"a04aa9954122a8a2183465978d687b6dc6757589"},{"author":{"_account_id":6773,"name":"Lucas Alvares Gomes","email":"lucasagomes@gmail.com","username":"lucasagomes"},"change_message_id":"8e7ff365611b852cdb8c1291f9f63791974c6bb7","unresolved":false,"context_lines":[{"line_number":117,"context_line":""},{"line_number":118,"context_line":"  In the ML2 driver:"},{"line_number":119,"context_line":""},{"line_number":120,"context_line":"    def update_port_postcommit():"},{"line_number":121,"context_line":"        with acquire_lock(port_id, RESOURCE_TYPE_PORT):"},{"line_number":122,"context_line":"            update_port_in_ovn(port)"},{"line_number":123,"context_line":""},{"line_number":124,"context_line":".. note::"},{"line_number":125,"context_line":"  This lock mechanism is quite similar to the `task queue"}],"source_content_type":"text/x-rst","patch_set":8,"id":"3f4b6375_26c2885a","line":122,"range":{"start_line":120,"start_character":0,"end_line":122,"end_character":36},"in_reply_to":"3f4b6375_68e4edac","updated":"2017-11-02 13:48:05.000000000","message":"Thanks for taking a look at the patch Han, I will work a bit more on that PoC and test it properly. If all goes well I will update the spec to reflect the changes we will need for this approach.\n\nBtw Han, I\u0027m copying your reply on the patch here to keep all the context inline:\n\n\"Han Zhou\n\nPatch Set 8:\n\nLucas, I reviewed the POC. I think it exemplifies the idea very well. Thanks!\"","commit_id":"a04aa9954122a8a2183465978d687b6dc6757589"},{"author":{"_account_id":5756,"name":"Terry Wilson","email":"twilson@redhat.com","username":"otherwiseguy"},"change_message_id":"34cc61373085884db5e3841be46bd7f00b67a2c9","unresolved":false,"context_lines":[{"line_number":117,"context_line":""},{"line_number":118,"context_line":"  In the ML2 driver:"},{"line_number":119,"context_line":""},{"line_number":120,"context_line":"    def update_port_postcommit():"},{"line_number":121,"context_line":"        with acquire_lock(port_id, RESOURCE_TYPE_PORT):"},{"line_number":122,"context_line":"            update_port_in_ovn(port)"},{"line_number":123,"context_line":""},{"line_number":124,"context_line":".. note::"},{"line_number":125,"context_line":"  This lock mechanism is quite similar to the `task queue"}],"source_content_type":"text/x-rst","patch_set":8,"id":"3f4b6375_299b7b18","line":122,"range":{"start_line":120,"start_character":0,"end_line":122,"end_character":36},"in_reply_to":"3f4b6375_68e4edac","updated":"2017-11-02 14:32:49.000000000","message":"With the proposed solution wouldn\u0027t we still have an issue where if two different columns were updated by different workers (or two keys in the same map column), the second one processed would fail even though it is perfectly valid? As-is, the second transaction would fail, but the exception would be caught and we\u0027d just log. I\u0027m not sure object revision numbers are sufficient to provide correct behavior.","commit_id":"a04aa9954122a8a2183465978d687b6dc6757589"},{"author":{"_account_id":6773,"name":"Lucas Alvares Gomes","email":"lucasagomes@gmail.com","username":"lucasagomes"},"change_message_id":"d680a84195164baec66465a1c9129a54bc6c9006","unresolved":false,"context_lines":[{"line_number":117,"context_line":""},{"line_number":118,"context_line":"  In the ML2 driver:"},{"line_number":119,"context_line":""},{"line_number":120,"context_line":"    def update_port_postcommit():"},{"line_number":121,"context_line":"        with acquire_lock(port_id, RESOURCE_TYPE_PORT):"},{"line_number":122,"context_line":"            update_port_in_ovn(port)"},{"line_number":123,"context_line":""},{"line_number":124,"context_line":".. note::"},{"line_number":125,"context_line":"  This lock mechanism is quite similar to the `task queue"}],"source_content_type":"text/x-rst","patch_set":8,"id":"3f4b6375_a9540b54","line":122,"range":{"start_line":120,"start_character":0,"end_line":122,"end_character":36},"in_reply_to":"3f4b6375_69ce1d83","updated":"2017-10-24 10:17:55.000000000","message":"\u003e If the lock is acquired in the Neutron DB transaction, and released after the OVN transaction, then yes, we are connecting the 2 transactions with the lock, which was the initial proposal you had. However, now since you don\u0027t register the lock in precommit, which means the lock has nothing to do with the Neutron DB transaction, so it won\u0027t provide any extra guard.\n\nThanks for the reply, I can see now how much information this spec is missing about the current design.\n\nSo, in the create_*_precommit() I reuse the same transaction to create the \"initial review\" for that object (only in create, we don\u0027t need that for updates) which is set to -1. The -1 is not a valid revision number, it\u0027s just a placeholder to make sure we have that resource registered in the database and we can keep track of it. This placeholder is committed together in the same transaction that the neutron object is being created [0][1].\n\nNow, once we get to the postcommit we try to acquire a lock for that object, and within the lock we will do 2 other things:\n\n1) Do the operation in OVN (and also save the revision number to external_ids)\n\n2) Update the revision number (the initial revision in case of create) with the correct revision number in the Neutron database (the ovn_revision_number table). This revision number bump will only happens if no errors are raised in the OVN transaction [2]. And the bump also happens prior to the lock being released.\n\nThese are the two transactions that the lock will guard.\n\nThe item 2) is what is what being proposed in the \"An idea for a more efficient database comparison\" section of this spec. That allows us to be able to fast compare the revision numbers of the resources in Neutron and OVN with a single JOIN between the 2 tables in the Neutron database [3] and identify the inconsistent resources. Otherwise, if we only save the revision numbers in the external_ids of the OVNDB we will need to loop thru all resources in it and compare those numbers, which can be pretty slow depending on the number of resources we have. \n\nNow, once a resource is marked as inconsistent we can get a lock for it and compare the revision number in the OVNDB with what is in Neutron. If it\u0027s different we sync it. If it\u0027s the same we know that something happened with the db operation that was suppose to bump that revision number in the Neutron database (ovn_revision_number table) so we fix that table. Basically the revision number in OVNDB is the source of truth, and the verify() you proposed will come handy here.\n\nDoes it make more sense now ? Maybe we have some room for simplification, not sure.\n\n[0] https://review.openstack.org/#/c/503704/3/networking_ovn/ml2/mech_driver.py@241\n\n[1] https://review.openstack.org/#/c/503704/3/networking_ovn/db/revision.py@99 \n\n[2] https://review.openstack.org/#/c/503704/3/networking_ovn/db/lock.py@81\n\n[3] https://review.openstack.org/#/c/503705/3/networking_ovn/db/maintenance.py@27\n\n\u003e I think you mentioned at L183 about using OVSDB named lock mechanism for this purpose. But yes, it doesn\u0027t matter which DB lock to use, as long as it is distributed. If the lock is added only for this purpose I am completely ok, but we need to make it clear.\n\nYeah, but I found it a bit difficult to use the named locks in OVS due to the design of the IDLs. Right now, each IDL can only hold one named lock as far as I can tell by looking at the code [0] (it seems that the OVSDB protocol does support multiple locks per IDL, but it doesn\u0027t seem to have been designed in that way).\n\n[0] https://github.com/openvswitch/ovs/blob/f73d562fc0ee3ff43f65cc418f213a79a727cb19/python/ovs/db/idl.py#L136-L140","commit_id":"a04aa9954122a8a2183465978d687b6dc6757589"},{"author":{"_account_id":6773,"name":"Lucas Alvares Gomes","email":"lucasagomes@gmail.com","username":"lucasagomes"},"change_message_id":"303f9482f7d2e0b91775d8c4dc97daffd424b69b","unresolved":false,"context_lines":[{"line_number":117,"context_line":""},{"line_number":118,"context_line":"  In the ML2 driver:"},{"line_number":119,"context_line":""},{"line_number":120,"context_line":"    def update_port_postcommit():"},{"line_number":121,"context_line":"        with acquire_lock(port_id, RESOURCE_TYPE_PORT):"},{"line_number":122,"context_line":"            update_port_in_ovn(port)"},{"line_number":123,"context_line":""},{"line_number":124,"context_line":".. note::"},{"line_number":125,"context_line":"  This lock mechanism is quite similar to the `task queue"}],"source_content_type":"text/x-rst","patch_set":8,"id":"3f4b6375_9c1d2e11","line":122,"range":{"start_line":120,"start_character":0,"end_line":122,"end_character":36},"in_reply_to":"3f4b6375_731ab87d","updated":"2017-10-23 09:03:20.000000000","message":"I Han, well, thank you for the patience as well!\n\nThanks for the explanation. I understand it but, the work on this spec goes a bit beyond having the numbers saved in the OVNDB. We also need some means to compare the versions in Neutron and OVN and recover the inconsistencies.\n\nFor example, in the solution 2 (as the text is organized right now) its proposing having the revision numbers also saved in a separated table in the Neutron database, that way we could just use a JOIN between both tables and readily find the inconsistencies between the versions of the resources in Neutron and the ones we have in OVN. Without it, if we only have the revision numbers as part of the external_ids in the OVNDB we would need to inspect those versions for all the resources in OVNDB and compare it with Neutron.\n\nThese are two separated transactions that can\u0027t be merged (different databases). And the lock is suppose to create a safe guard around then, so we a guarantee that the version on both dbs have been updated atomically.\n\nThe lock would also be used when recovering the resources, before we try to sync it when an inconsistency is found we would get a lock and be sure no other work will race with it mid-process.\n\nI\u0027m not sure how I would get around these things without an external lock to protect all the dependent database operations both for Neutron and OVN.\n\nIdeas ?","commit_id":"a04aa9954122a8a2183465978d687b6dc6757589"},{"author":{"_account_id":11762,"name":"Han Zhou","email":"zhouhan@gmail.com","username":"hanzhou"},"change_message_id":"2135fc06b055a398c8257f454f0f25c7853d74a3","unresolved":false,"context_lines":[{"line_number":117,"context_line":""},{"line_number":118,"context_line":"  In the ML2 driver:"},{"line_number":119,"context_line":""},{"line_number":120,"context_line":"    def update_port_postcommit():"},{"line_number":121,"context_line":"        with acquire_lock(port_id, RESOURCE_TYPE_PORT):"},{"line_number":122,"context_line":"            update_port_in_ovn(port)"},{"line_number":123,"context_line":""},{"line_number":124,"context_line":".. note::"},{"line_number":125,"context_line":"  This lock mechanism is quite similar to the `task queue"}],"source_content_type":"text/x-rst","patch_set":8,"id":"3f4b6375_484863cb","line":122,"range":{"start_line":120,"start_character":0,"end_line":122,"end_character":36},"in_reply_to":"3f4b6375_7ea091a6","updated":"2017-10-19 04:44:15.000000000","message":"I agree that precommit() should still be the point where we update the revision number. However, I think we don\u0027t need the lock \u003csolution1\u003e here any more, since it doesn\u0027t preserve ordering. Without ordering, the only purpose of the lock here is to provide atomic update to OVN, but that is already provided by ovsdb transaction itself. If there are more than one client is doing read-modify-write operation to same row, the later one would fail and redo the read-modify-write operation. The version comparison would be again cached data in the IDL, which is totally fine, as long as there is a \"verify\" operation included in the IDL command. I think this would solve the problem without introducing a new locking mechanism.","commit_id":"a04aa9954122a8a2183465978d687b6dc6757589"},{"author":{"_account_id":6773,"name":"Lucas Alvares Gomes","email":"lucasagomes@gmail.com","username":"lucasagomes"},"change_message_id":"7ee46598e8a1e3dd7f67c0ced42d7d45a8719131","unresolved":false,"context_lines":[{"line_number":117,"context_line":""},{"line_number":118,"context_line":"  In the ML2 driver:"},{"line_number":119,"context_line":""},{"line_number":120,"context_line":"    def update_port_postcommit():"},{"line_number":121,"context_line":"        with acquire_lock(port_id, RESOURCE_TYPE_PORT):"},{"line_number":122,"context_line":"            update_port_in_ovn(port)"},{"line_number":123,"context_line":""},{"line_number":124,"context_line":".. note::"},{"line_number":125,"context_line":"  This lock mechanism is quite similar to the `task queue"}],"source_content_type":"text/x-rst","patch_set":8,"id":"3f4b6375_5e23e756","line":122,"range":{"start_line":120,"start_character":0,"end_line":122,"end_character":36},"in_reply_to":"3f4b6375_7ecd952f","updated":"2017-10-25 10:03:42.000000000","message":"Hi Han,\n\nThanks for laying it down.\n\nI think the misunderstand here is that this spec is proposing to increment the revision numbers for Neutron. That\u0027s not the case, basically every time you create a resource in Neutron it will already have a revision number for it, for example:\n\n$ openstack port create --network nettest porttest\n...\n| id                    | 91c08021-ded3-4c5a-8d57-5b5c389f8e39 |\n...\n| revision_number       | 2                                    |\n...\n\nAnd every time that resource is updated the revision number is incremented by Neutron:\n\n$ openstack port set porttest --mac-address 11:22:33:44:55:66\n\n$ mysql -e \"use neutron; select standard_attr_id from ports where id\u003d\\\"91c08021-ded3-4c5a-8d57-5b5c389f8e39\\\";\"\n+------------------+\n| standard_attr_id |\n+------------------+\n|             1427 |\n+------------------+\n\n$ mysql -e \"use neutron; select revision_number from standardattributes where id\u003d1427;\"\n+-----------------+\n| revision_number |\n+-----------------+\n|               3 |\n+-----------------+\n\nSo, the proposal here is to re-use this revision number that is already part of Neutron and incremented by neutron (within the same precommit() transaction). But we are adding more controls over it.\n\nThe spec proposes a new table in the Neutron database that will **also** deal with revision numbers, but, the new table will reflect the revision number of the OVN resources (opposed to the Neutron ones). We are calling this table \"ovn_revision_numbers\" in this spec.\n\nBack to the problem you put there:\n\n\u003e Neutron Server 1 (N1): update port with address A (revision_number: 1)\n\u003e Neutron Server 2 (N2): update port with address B (revision_number: 2)\n\u003e T1: N1 precommit - Neutron DB has address A\n\u003e T2: N2 precommit - Neutron DB has address B\n\u003e T3: N2 postcommit - lock; update OVN port to address B, update revision to V1 in both Neutron and OVN; unlock\n\nHere, the revision_number of the update with address B will not be V1 but V2 (incremented in the precommit() by Neutron). But, the order in which postcommit() is called is inverted (that\u0027s what your example is exposing right?). In this case, we will update the resource and save the revision_number V2 in the OVNDB and also in the new table \"ovn_revision_numbers\".\n\n\u003e T4: N1 postcommit - lock; update OVN port to address A, update revision to V2 in both Neutron and OVN; unlock\n\nOnce it gets to the postcommit() for this change, first thing we do after locking would be to compare the revision_numbers and we will see that we already have V2 in the database. Since V2 \u003e V1 we will simply drop this change.\n\n\u003e Do the revision number increment in precommit, so the revision number reflects the latest change of Neutron DB (which is the desired state: what it should be). Then in postcommit, the revision number will be compared against what it really is in OVN. If the update comes with an older number than what is in OVN, the update should be skipped.\n\nThat\u0027s what happens, Neutron itself will bump the revision_number in the standardattibutes table in the precommit() and we will use that version (which is present in the context passed to postcommit()) in the comparasion with the version in OVN in the postcommit().\n\n...\n\nI believe the main confusion here is why this spec is proposing a second \"ovn_revision_numbers\" table in the Neutron database and for that reason we require a lock (to guard the OVNDB transaction + the update on this new table transaction in postcommit()).\n\nThe idea here is to make the detection of inconsistent resources simpler and faster. Although the new \"ovn_revision_numbers\" table lives in the Neutron DB, it\u0027s actually saving the revision_numbers referent to the OVN resources (not the Neutron ones).\n\nAfter we successfully update a resource in OVN (postcommit()) we will also update this new table with the revision_number referent to that change. That way, if we want to list the resources in OVN that have a different revision_number than the resources in Neutron we can simple do a JOIN between the \"standardattributes\" table and the new \"ovn_attributes_table\" and find out these differences.\n\nWithout this new table we would need to list all resources in OVN and compare their version with the version in the Neutron database in a loop, which can be quite slow depending on the number of resources we have.\n\nOne possible problem here is the case where we successfully update the resource in OVN but fail to update the new \"ovn_revision_numbers\" table (for whatever reason), that will make things look incosistent. But, it should be no problem because revision_number will be also saved in the external_id field in OVNDB as part of the transaction that is updating that resource in OVN. So, the code that is suppose to fix the inconsistency will look at the version in the OVNDB and will see that it does matches in Neutron, if the versions matches we know that something happened with updating the \"ovn_revision_numbers\" table and we will simply fix that table (obvisouly, if the resource is really out-of-sync - version in OVN doesn\u0027t match the Neutron version - we will first sync the resource with what is latest in Neutron and then update the \"ovn_revision_numbers\" table to reflect that).\n\nDoes it make more sense now ?\n\n...\n\nOn your second example where no locks in required:\n\n\u003e Same example as above:\n\u003e T1: N1 precommit - Neutron DB has address A, V1 (revision number incremented in DB transaction)\n\u003e T2: N2 precommit - Neutron DB has address B, V2\n\u003e T3: N2 postcommit - Compare: V2 \u003e V0 in OVN, so update OVN port to address B, V2\n\nWithout a lock on that resource this could contain a race, no ? The time between the version comparison (V2 \u003e V0) and the update of the OVN port there\u0027s nothing that guarantees that a 3rd update will happen. And that would overwrite the changes of V3 (unless we use that verify() you mentioned and somehow there we compare V3 \u003c V2 and the transaction is aborted).\n\nBut anyway, for the reasons above, I still think that having that new \"ovn_revision_numbers\" table to be able to fast detect the inconsistencies worth it, even if we need a lock for guarding both transcations (OVNDB + Neutron ovn_revision_numbers table). Since the lock only operates at a single resource level we still allows concurrency for everything else.\n\nHope that clarifies a bit more the idea, and yes, I need to update the spec to reflect it better (my bad, sorry).\n\nLet me know what you think, and again, thanks for the comments they are very helpful.","commit_id":"a04aa9954122a8a2183465978d687b6dc6757589"},{"author":{"_account_id":5756,"name":"Terry Wilson","email":"twilson@redhat.com","username":"otherwiseguy"},"change_message_id":"3ab7b5e90d20588963f9b7c0d70612730eb2a33a","unresolved":false,"context_lines":[{"line_number":117,"context_line":""},{"line_number":118,"context_line":"  In the ML2 driver:"},{"line_number":119,"context_line":""},{"line_number":120,"context_line":"    def update_port_postcommit():"},{"line_number":121,"context_line":"        with acquire_lock(port_id, RESOURCE_TYPE_PORT):"},{"line_number":122,"context_line":"            update_port_in_ovn(port)"},{"line_number":123,"context_line":""},{"line_number":124,"context_line":".. note::"},{"line_number":125,"context_line":"  This lock mechanism is quite similar to the `task queue"}],"source_content_type":"text/x-rst","patch_set":8,"id":"3f4b6375_4d9e51b9","line":122,"range":{"start_line":120,"start_character":0,"end_line":122,"end_character":36},"in_reply_to":"3f4b6375_87ee721f","updated":"2017-10-27 17:10:15.000000000","message":"\u003e 1) how to abort a transaction - I think we don\u0027t have to abort the transaction, but simply don\u0027t do the update operation for the resource if the version in OVN is newer. (in the code sample you provided it would just return, so the command end up as a no-op but not as failure). However, I think it is also valid to raise exception which causes the whole transaction got aborted, then the initiator of the transaction can just catch such kind of Exception and ignore. Maybe you will see which way is better when implementing it.\n\n\u003e Indeed, for some operations, when the transaction has a single command it\u0027s easier to just return as you said and make it no-op.\n\nWhat about situations like two updates coming in at the same time that update different fields on the same object? In this case, one wouldn\u0027t want to abort and one wouldn\u0027t want to do a no-op and silently pass. Just having a revision number for the object doesn\u0027t really help us make the right decision there does it?","commit_id":"a04aa9954122a8a2183465978d687b6dc6757589"},{"author":{"_account_id":11762,"name":"Han Zhou","email":"zhouhan@gmail.com","username":"hanzhou"},"change_message_id":"a51aadf4596619d253b44ccb7c0ff9b3fe0be20e","unresolved":false,"context_lines":[{"line_number":117,"context_line":""},{"line_number":118,"context_line":"  In the ML2 driver:"},{"line_number":119,"context_line":""},{"line_number":120,"context_line":"    def update_port_postcommit():"},{"line_number":121,"context_line":"        with acquire_lock(port_id, RESOURCE_TYPE_PORT):"},{"line_number":122,"context_line":"            update_port_in_ovn(port)"},{"line_number":123,"context_line":""},{"line_number":124,"context_line":".. note::"},{"line_number":125,"context_line":"  This lock mechanism is quite similar to the `task queue"}],"source_content_type":"text/x-rst","patch_set":8,"id":"3f4b6375_26c95dab","line":122,"range":{"start_line":120,"start_character":0,"end_line":122,"end_character":36},"in_reply_to":"3f4b6375_8fc6ef67","updated":"2017-11-02 19:02:15.000000000","message":"\u003e With the proposed solution wouldn\u0027t we still have an issue where if two different columns were updated by different workers (or two keys in the same map column), the second one processed would fail even though it is perfectly valid? As-is, the second transaction would fail, but the exception would be caught and we\u0027d just log. I\u0027m not sure object revision numbers are sufficient to provide correct behavior.\n\nBasically I agree with Lucas that there is no problem in this case. If two different workers updates same object, no matter if it is for same column or different column, they will carry different revision number of that Neutron object, and the versioning mechanism will make sure the latest version is updated to OVN.\n\nAn assumption here is that each update to an OVN object (lport, lswitch, lrouter, etc.) should contain all attributes of that object rather than a partial update. Otherwise, we could lose an older update for a different column. We should check the current IDL implementation to avoid partial updates. For certain types of objects that is supposed to be partially updated, such as address set, I think we cannot use versioning mechanism to ensure consistency (in fact there is no revision number to use in Neutron for address set).\n\nBesides, this is not a problem that can be solved by another lock in postcommit, which will face the same problem as using OVN transaction.","commit_id":"a04aa9954122a8a2183465978d687b6dc6757589"},{"author":{"_account_id":5756,"name":"Terry Wilson","email":"twilson@redhat.com","username":"otherwiseguy"},"change_message_id":"34cc61373085884db5e3841be46bd7f00b67a2c9","unresolved":false,"context_lines":[{"line_number":117,"context_line":""},{"line_number":118,"context_line":"  In the ML2 driver:"},{"line_number":119,"context_line":""},{"line_number":120,"context_line":"    def update_port_postcommit():"},{"line_number":121,"context_line":"        with acquire_lock(port_id, RESOURCE_TYPE_PORT):"},{"line_number":122,"context_line":"            update_port_in_ovn(port)"},{"line_number":123,"context_line":""},{"line_number":124,"context_line":".. note::"},{"line_number":125,"context_line":"  This lock mechanism is quite similar to the `task queue"}],"source_content_type":"text/x-rst","patch_set":8,"id":"3f4b6375_25c73e15","line":122,"range":{"start_line":120,"start_character":0,"end_line":122,"end_character":36},"in_reply_to":"3f4b6375_96d091cb","updated":"2017-11-02 14:32:49.000000000","message":"\u003e Right, one downside of raising the exception at the moment is that ovsdbapp will log any exception as an ERROR. But, we could possibly extend ovsdbapp and create a special exception type which will abort the transaction at [0].\n\nOne can also choose whether something logs an error by transaction/execute(..., log_errors\u003dFalse) on a case-by-case basis.","commit_id":"a04aa9954122a8a2183465978d687b6dc6757589"},{"author":{"_account_id":6773,"name":"Lucas Alvares Gomes","email":"lucasagomes@gmail.com","username":"lucasagomes"},"change_message_id":"58e875a954cfdaf134f83f771870f11fe60e3efa","unresolved":false,"context_lines":[{"line_number":117,"context_line":""},{"line_number":118,"context_line":"  In the ML2 driver:"},{"line_number":119,"context_line":""},{"line_number":120,"context_line":"    def update_port_postcommit():"},{"line_number":121,"context_line":"        with acquire_lock(port_id, RESOURCE_TYPE_PORT):"},{"line_number":122,"context_line":"            update_port_in_ovn(port)"},{"line_number":123,"context_line":""},{"line_number":124,"context_line":".. note::"},{"line_number":125,"context_line":"  This lock mechanism is quite similar to the `task queue"}],"source_content_type":"text/x-rst","patch_set":8,"id":"3f4b6375_68e4edac","line":122,"range":{"start_line":120,"start_character":0,"end_line":122,"end_character":36},"in_reply_to":"3f4b6375_96d091cb","updated":"2017-11-01 17:02:20.000000000","message":"Hi Han,\n\nI\u0027ve uploaded a PoC for this lock-free idea at https://review.openstack.org/#/c/517049/. Mind taking a look at it ?\n\nIt doesn\u0027t contain the optimization bits nor the part that fixes the inconsistencies. I just want to see if the way we are saving the revision number in the OVN database and working with the transactions is aligned with what you proposed.\n\nCheers,\nLucas","commit_id":"a04aa9954122a8a2183465978d687b6dc6757589"},{"author":{"_account_id":11762,"name":"Han Zhou","email":"zhouhan@gmail.com","username":"hanzhou"},"change_message_id":"ab8fc3b43da4042d4127c76293ce084ee7579244","unresolved":false,"context_lines":[{"line_number":117,"context_line":""},{"line_number":118,"context_line":"  In the ML2 driver:"},{"line_number":119,"context_line":""},{"line_number":120,"context_line":"    def update_port_postcommit():"},{"line_number":121,"context_line":"        with acquire_lock(port_id, RESOURCE_TYPE_PORT):"},{"line_number":122,"context_line":"            update_port_in_ovn(port)"},{"line_number":123,"context_line":""},{"line_number":124,"context_line":".. note::"},{"line_number":125,"context_line":"  This lock mechanism is quite similar to the `task queue"}],"source_content_type":"text/x-rst","patch_set":8,"id":"3f4b6375_69ce1d83","line":122,"range":{"start_line":120,"start_character":0,"end_line":122,"end_character":36},"in_reply_to":"3f4b6375_9c1d2e11","updated":"2017-10-24 00:26:14.000000000","message":"\u003e These are two separated transactions that can\u0027t be merged (different databases). And the lock is suppose to create a safe guard around then, so we a guarantee that the version on both dbs have been updated atomically.\n\nIf the lock is acquired in the Neutron DB transaction, and released after the OVN transaction, then yes, we are connecting the 2 transactions with the lock, which was the initial proposal you had. However, now since you don\u0027t register the lock in precommit, which means the lock has nothing to do with the Neutron DB transaction, so it won\u0027t provide any extra guard. But that is ok, since your new proposal is relying on revision numbers to ensure the consistency. Revision number is generated in Neutron DB transaction (and stored in Neutron DB, which I didn\u0027t disagree), and it is compared against the one in OVN (stored in external_ids) in OVN transaction, which I think is enough to ensure the consistency. I\u0027d like to emphasize that doing the version compare in OVN transaction is critical to the correctness (i.e. add the compare as part of IDL command for port update, together with the \"verify\" operation). Do you have any example that the Neutron DB + OVN DB transactions + the revision number check won\u0027t work for the consistency of each individual object, but an extra lock would work? With such an example I think it will be more clear to discuss.\n\n\u003e The lock would also be used when recovering the resources, before we try to sync it when an inconsistency is found we would get a lock and be sure no other work will race with it mid-process.\n\nI think you mentioned at L183 about using OVSDB named lock mechanism for this purpose. But yes, it doesn\u0027t matter which DB lock to use, as long as it is distributed. If the lock is added only for this purpose I am completely ok, but we need to make it clear.","commit_id":"a04aa9954122a8a2183465978d687b6dc6757589"},{"author":{"_account_id":6773,"name":"Lucas Alvares Gomes","email":"lucasagomes@gmail.com","username":"lucasagomes"},"change_message_id":"94c099f38b355a017f781fd1dbd4a6652e39ae73","unresolved":false,"context_lines":[{"line_number":117,"context_line":""},{"line_number":118,"context_line":"  In the ML2 driver:"},{"line_number":119,"context_line":""},{"line_number":120,"context_line":"    def update_port_postcommit():"},{"line_number":121,"context_line":"        with acquire_lock(port_id, RESOURCE_TYPE_PORT):"},{"line_number":122,"context_line":"            update_port_in_ovn(port)"},{"line_number":123,"context_line":""},{"line_number":124,"context_line":".. note::"},{"line_number":125,"context_line":"  This lock mechanism is quite similar to the `task queue"}],"source_content_type":"text/x-rst","patch_set":8,"id":"3f4b6375_b38a1b61","line":122,"range":{"start_line":120,"start_character":0,"end_line":122,"end_character":36},"in_reply_to":"3f4b6375_a1b53734","updated":"2017-11-03 10:12:27.000000000","message":"I have been looking at the code and thinking about what @Terry said and I think that we might indeed have a problem with that scenario for some resources.\n\nFor example, the same port scenario that we used as an example before:\n\nThe port object in OVNDB is at V1.\n\nNow two updates two other updates for that port comes in parallel: V2 and V3.\n\nLet\u0027s say V3 arrived first at update_port_postcommit(). Once there we have two parameters available with it: \"port\" (V3) and \"original_port\" which I believe would be V2 [0]\n\nIn some places at update_port() from OVNClient we are computing the differences between \"port\" and \"original_port\" [1]. Which basically means we are computing the differences from V3 and V2, not V3 and V1 (which is the current version in OVNDB.\n\nThose changes in V2 hasn\u0027t been applied yet, so the delta could be different and, if we drop V2 later one we indeed could lose some data.\n\nDoes it make sense ?\n\nWe could fix it by ignoring the \"original_port\" and instead compare with what we currently have in OVNDB, similar to what the ovn_db_sync.py script does [2] (but only that specific port instead of the whole db, ofc).\n\nAt the long run it\u0027s even better we would need to do it regardless because if a port gets out of sync the code that is suppose to fix the problem will not have access to the \"original_port\" parameter anyway. It will only be able to fetch the latest version in the Neutron database and give it to the update_port() method in OVNClient. Like I did here in the old PoC [3] (the one with the locking mechanism).\n\nWhat you folk think ?\n\n[0] https://github.com/openstack/networking-ovn/blob/be72525b5e2676111432aa75ce79183e757c1d51/networking_ovn/ml2/mech_driver.py#L429-L430\n\n[1] https://github.com/openstack/networking-ovn/blob/be72525b5e2676111432aa75ce79183e757c1d51/networking_ovn/common/ovn_client.py#L345-L354\n\n[2] https://github.com/openstack/networking-ovn/blob/be72525b5e2676111432aa75ce79183e757c1d51/networking_ovn/ovn_db_sync.py#L131-L133\n\n[3] https://review.openstack.org/#/c/503705/3/networking_ovn/common/ovn_client.py@835","commit_id":"a04aa9954122a8a2183465978d687b6dc6757589"},{"author":{"_account_id":6773,"name":"Lucas Alvares Gomes","email":"lucasagomes@gmail.com","username":"lucasagomes"},"change_message_id":"adfdbbaee4bba5319c658319e32d98a4d4c34cfa","unresolved":false,"context_lines":[{"line_number":117,"context_line":""},{"line_number":118,"context_line":"  In the ML2 driver:"},{"line_number":119,"context_line":""},{"line_number":120,"context_line":"    def update_port_postcommit():"},{"line_number":121,"context_line":"        with acquire_lock(port_id, RESOURCE_TYPE_PORT):"},{"line_number":122,"context_line":"            update_port_in_ovn(port)"},{"line_number":123,"context_line":""},{"line_number":124,"context_line":".. note::"},{"line_number":125,"context_line":"  This lock mechanism is quite similar to the `task queue"}],"source_content_type":"text/x-rst","patch_set":8,"id":"3f4b6375_f7edf824","line":122,"range":{"start_line":120,"start_character":0,"end_line":122,"end_character":36},"in_reply_to":"3f4b6375_a53c0393","updated":"2017-11-06 11:49:49.000000000","message":"\u003e 1) don\u0027t do delta based update. This is similar as your suggestion. It means we need to always calculate whatever should be in OVN according to current state of Neutron. For port-update, it is related to ACL/address set update, which can be huge amount of data. A full recalculation may cause scalability problem. (There were actually lots of effort spent on the optimizations).\n\nRight, yeah I\u0027m leaning towards trying this out. Also, if we re-implement address sets similar to what we\u0027ve discussed at line 134 I believe we could help with this calculation.\n\nI\u0027ve enhanced the PoC patch to ignore the \"original_network\" parameter as we discussed it here, this is how it looks now:  https://review.openstack.org/#/c/517049/3\n\nMind taking a quick look ?\n\n2) if we find out that delta based update is still the right/efficient way to go, we can\u0027t rely on revision number to maintain consistency. We have to introduce the original idea of registering a lock in pre-commit the ensure the ordering.\n\nIndeed, let\u0027s keep this option open.\n\n\u003e Not sure if we have any other options to ensure 100% correctness. Maybe a third practical option is to alleviate the consistency requirement to eventual consistency, which means we may avoid the complexity but rely on the sync mechanism to make it eventually consistent, and the optimizations about using revision number as a hint to tell the inconsistent part quickly can be helpful.\n\nThat\u0027s another option too. When I look at it, it seems kinda inevitable to have a periodic mechanism to make sure things are in sync from time to time.\n\nThe journal approach does serialize the requests but still runs a \"full sync\" task from time to time [0] to make sure things are in sync.\n\n[0] https://github.com/openstack/networking-odl/blob/4791075c0307614401a5c41c648f1729160097dc/networking_odl/journal/full_sync.py","commit_id":"a04aa9954122a8a2183465978d687b6dc6757589"},{"author":{"_account_id":11762,"name":"Han Zhou","email":"zhouhan@gmail.com","username":"hanzhou"},"change_message_id":"49712528ef5769e2301a690f75745599ffa054b2","unresolved":false,"context_lines":[{"line_number":117,"context_line":""},{"line_number":118,"context_line":"  In the ML2 driver:"},{"line_number":119,"context_line":""},{"line_number":120,"context_line":"    def update_port_postcommit():"},{"line_number":121,"context_line":"        with acquire_lock(port_id, RESOURCE_TYPE_PORT):"},{"line_number":122,"context_line":"            update_port_in_ovn(port)"},{"line_number":123,"context_line":""},{"line_number":124,"context_line":".. note::"},{"line_number":125,"context_line":"  This lock mechanism is quite similar to the `task queue"}],"source_content_type":"text/x-rst","patch_set":8,"id":"1f485f77_82bd2f6a","line":122,"range":{"start_line":120,"start_character":0,"end_line":122,"end_character":36},"in_reply_to":"3f4b6375_a7bcc406","updated":"2017-11-09 19:30:43.000000000","message":"Sounds good.\n\n\u003e we could still use it for certain resources that does not update those conflicting fields (e.g when updating networks)\n\nI\u0027d prefer a unified approach for all resource types so that it is easier to maintain. Before we find that solution, optimizing re-sync efficiency for eventual consistency sounds good to me.","commit_id":"a04aa9954122a8a2183465978d687b6dc6757589"},{"author":{"_account_id":11762,"name":"Han Zhou","email":"zhouhan@gmail.com","username":"hanzhou"},"change_message_id":"7ed25f312712dd1872367e99ea44f43f13cbf59a","unresolved":false,"context_lines":[{"line_number":117,"context_line":""},{"line_number":118,"context_line":"  In the ML2 driver:"},{"line_number":119,"context_line":""},{"line_number":120,"context_line":"    def update_port_postcommit():"},{"line_number":121,"context_line":"        with acquire_lock(port_id, RESOURCE_TYPE_PORT):"},{"line_number":122,"context_line":"            update_port_in_ovn(port)"},{"line_number":123,"context_line":""},{"line_number":124,"context_line":".. note::"},{"line_number":125,"context_line":"  This lock mechanism is quite similar to the `task queue"}],"source_content_type":"text/x-rst","patch_set":8,"id":"3f4b6375_7ecd952f","line":122,"range":{"start_line":120,"start_character":0,"end_line":122,"end_character":36},"in_reply_to":"3f4b6375_a9540b54","updated":"2017-10-24 19:01:52.000000000","message":"I thing the revision number should be increased in precommit(), but it seems you are suggesting bumping it in postcommit() instead? I am sorry I didn\u0027t catch this point earlier. This way, how would it protect against race condition, such as:\n\nNeutron Server 1 (N1): update port with address A\nNeutron Server 2 (N2): update port with address B\n\nT1: N1 precommit - Neutron DB has address A\n\nT2: N2 precommit - Neutron DB has address B\n\nT3: N2 postcommit - lock; update OVN port to address B, update revision to V1 in both Neutron and OVN; unlock\n\nT4: N1 postcommit - lock; update OVN port to address A, update revision to V2 in both Neutron and OVN; unlock\n\nAs you see, when N1 do the compare and T4, everything is consistent, and it will go ahead to do the update, right? So the final result will be: Neutron DB has address B, OVN has address A.\n\nCould you describe how the above will be solved?\n\nI understand that revision number will be stored in both Neutron DB and OVN. My suggestion is:\n\nDo the revision number increment in precommit, so the revision number reflects the latest change of Neutron DB (which is the desired state: what it should be). Then in postcommit, the revision number will be compared against what it really is in OVN. If the update comes with an older number than what is in OVN, the update should be skipped.\n\nSame example as above:\n\nT1: N1 precommit - Neutron DB has address A, V1 (revision number incremented in DB transaction)\n\nT2: N2 precommit - Neutron DB has address B, V2\n\nT3: N2 postcommit - Compare: V2 \u003e V0 in OVN, so update OVN port to address B, V2\n\nT4: N1 postcommit - Compare: V1 \u003c V2 in OVN, skip the update in OVN\n\nAs you might see, this doesn\u0027t need an extra lock. This seems to be simpler and clear to me. But I may misunderstood something, please correct me if so.","commit_id":"a04aa9954122a8a2183465978d687b6dc6757589"},{"author":{"_account_id":11762,"name":"Han Zhou","email":"zhouhan@gmail.com","username":"hanzhou"},"change_message_id":"062df83eaad31ae1137ff9d326fa8a5587b75fab","unresolved":false,"context_lines":[{"line_number":117,"context_line":""},{"line_number":118,"context_line":"  In the ML2 driver:"},{"line_number":119,"context_line":""},{"line_number":120,"context_line":"    def update_port_postcommit():"},{"line_number":121,"context_line":"        with acquire_lock(port_id, RESOURCE_TYPE_PORT):"},{"line_number":122,"context_line":"            update_port_in_ovn(port)"},{"line_number":123,"context_line":""},{"line_number":124,"context_line":".. note::"},{"line_number":125,"context_line":"  This lock mechanism is quite similar to the `task queue"}],"source_content_type":"text/x-rst","patch_set":8,"id":"3f4b6375_a53c0393","line":122,"range":{"start_line":120,"start_character":0,"end_line":122,"end_character":36},"in_reply_to":"3f4b6375_b38a1b61","updated":"2017-11-03 21:48:39.000000000","message":"Lucas, thanks for the careful check!\n\nThe code in [1] is about getting delta, and based on the delta we will calculate what should be update for OVN ACLs and Address Sets.\n\nI think this kind of problem is hard to be solved by the revision number based mechanism. If we ignore upon version conflict, we will lose changes; but if we want to retry the whole logic rather than ignore, we don\u0027t know what should be the original to compare with, which means we lose the delta information.\n\nSo we have 2 options:\n1) don\u0027t do delta based update. This is similar as your suggestion. It means we need to always calculate whatever should be in OVN according to current state of Neutron. For port-update, it is related to ACL/address set update, which can be huge amount of data. A full recalculation may cause scalability problem. (There were actually lots of effort spent on the optimizations).\n\n2) if we find out that delta based update is still the right/efficient way to go, we can\u0027t rely on revision number to maintain consistency. We have to introduce the original idea of registering a lock in pre-commit the ensure the ordering.\n\nNot sure if we have any other options to ensure 100% correctness. Maybe a third practical option is to alleviate the consistency requirement to eventual consistency, which means we may avoid the complexity but rely on the sync mechanism to make it eventually consistent, and the optimizations about using revision number as a hint to tell the inconsistent part quickly can be helpful.\n\nThoughts?","commit_id":"a04aa9954122a8a2183465978d687b6dc6757589"},{"author":{"_account_id":6773,"name":"Lucas Alvares Gomes","email":"lucasagomes@gmail.com","username":"lucasagomes"},"change_message_id":"344b90f7eacd2d44274929a6303c6fee0dc0b93b","unresolved":false,"context_lines":[{"line_number":117,"context_line":""},{"line_number":118,"context_line":"  In the ML2 driver:"},{"line_number":119,"context_line":""},{"line_number":120,"context_line":"    def update_port_postcommit():"},{"line_number":121,"context_line":"        with acquire_lock(port_id, RESOURCE_TYPE_PORT):"},{"line_number":122,"context_line":"            update_port_in_ovn(port)"},{"line_number":123,"context_line":""},{"line_number":124,"context_line":".. note::"},{"line_number":125,"context_line":"  This lock mechanism is quite similar to the `task queue"}],"source_content_type":"text/x-rst","patch_set":8,"id":"3f4b6375_87ee721f","line":122,"range":{"start_line":120,"start_character":0,"end_line":122,"end_character":36},"in_reply_to":"3f4b6375_cc64984c","updated":"2017-10-27 08:46:21.000000000","message":"Hi Han,\n\n\u003e Thank you. I think it is a great discussion!\n\nCan\u0027t agree more!\n\n\u003e 1) how to abort a transaction - I think we don\u0027t have to abort the transaction, but simply don\u0027t do the update operation for the resource if the version in OVN is newer. (in the code sample you provided it would just return, so the command end up as a no-op but not as failure). However, I think it is also valid to raise exception which causes the whole transaction got aborted, then the initiator of the transaction can just catch such kind of Exception and ignore. Maybe you will see which way is better when implementing it.\n\nIndeed, for some operations, when the transaction has a single command it\u0027s easier to just return as you said and make it no-op.\n\nBut, there are places where we have many commands within the same transaction, that would be better to have some sort of exception that we raise and abort the whole chain. We can already do it with the current ovsdbapp model but the problem is that right now it logs all exceptions as they were errors. It would be nice to have a \"special\" exception that ovsdbapp would understand as an skip/abort operation and log it accordingly (perhaps as INFO). It could be implemented here I believe: https://github.com/openstack/ovsdbapp/blob/a87276864ee369ed5d04c333066791cd6fd79fa7/ovsdbapp/backend/ovs_idl/transaction.py#L87-L90\n\nI will look into that.\n\n2) how to connect different transactions - I think if different transactions needs to be atomic to ensure consistency, then we should just change the code to put them under a single transaction, because that is what transaction is for :)\n\nRight, yeah let\u0027s look into that. Some places this might be a bit tricky because the second transaction might depend on the result of the first one, here for example: https://github.com/openstack/networking-ovn/blob/edfd2001c6d2590b1c49821a9664c2abe23e6719/networking_ovn/common/ovn_client.py#L995-L1008\n\nBut since it happens at the creation (adding an entry to the DHCP_Options table) it won\u0027t contain any revision number yet, so I guess that would never be skipped anyway.\n\nI will investigate and see what I can find.\n\nSo far, thanks a lot for the ideas Han. Great discussion.","commit_id":"a04aa9954122a8a2183465978d687b6dc6757589"},{"author":{"_account_id":6773,"name":"Lucas Alvares Gomes","email":"lucasagomes@gmail.com","username":"lucasagomes"},"change_message_id":"efadc1629b79a5c73b96ee5ea4abbf4e189c5066","unresolved":false,"context_lines":[{"line_number":117,"context_line":""},{"line_number":118,"context_line":"  In the ML2 driver:"},{"line_number":119,"context_line":""},{"line_number":120,"context_line":"    def update_port_postcommit():"},{"line_number":121,"context_line":"        with acquire_lock(port_id, RESOURCE_TYPE_PORT):"},{"line_number":122,"context_line":"            update_port_in_ovn(port)"},{"line_number":123,"context_line":""},{"line_number":124,"context_line":".. note::"},{"line_number":125,"context_line":"  This lock mechanism is quite similar to the `task queue"}],"source_content_type":"text/x-rst","patch_set":8,"id":"3f4b6375_38612306","line":122,"range":{"start_line":120,"start_character":0,"end_line":122,"end_character":36},"in_reply_to":"3f4b6375_df3e8a89","updated":"2017-11-07 17:02:34.000000000","message":"Hi Han,\n\nGreat pointers (as always), thanks!\n\nThe OVNNB schema is a funny one, usually the elements that belongs to a \"container\" are associated with it the other way around. It should be something like: Instead of having a list of ports in the Logical_Switch\u0027s table (as we do now), we would have each element in Logical_Switch_Port to hold an id (foreign key) of the switch that port belongs to. That way, if we want to know which ports belongs to a certain switch we just need to select all elements that matches the that switch id in the Logical_Switch_Port table.\n\nThat would remove that point of conflict which is that shared list and make adding/deleting new ports/acls/etc... much simpler.\n\nBut there\u0027s a catch. Apparently that how it was done before [0] but changed in OVS due to a race in Neutron when deleting a network and creating new ports to it (see commit message).\n\nIt\u0027s arguable that while [0] fixed one problem it also created others and that feels like we are digging into more and more deeper architectural problems which regardless of the solution we pick here in this spec it won\u0027t be able to solve them all.\n\nI\u0027m actually thinking about what you said, about having a 3rd practical option that uses revision numbers to alleviate   the consistency (which does work for most of the resources) but still relying on a periodic sync mechanism to guarantee the consistency of those shared columns.\n\nAt the beginning the periodic sync will work as a band-aid solution but, it will be a known issue that could be tackled separated and fixed properly after this work (like the solution we briefly discussed regarding address sets).\n\nHow does that sounds to you ?\n\n[0] https://github.com/openvswitch/ovs/commit/445a266a5fe0220088210d556880f2939309d5e0","commit_id":"a04aa9954122a8a2183465978d687b6dc6757589"},{"author":{"_account_id":6773,"name":"Lucas Alvares Gomes","email":"lucasagomes@gmail.com","username":"lucasagomes"},"change_message_id":"ba8974e0c5f46312900dc53cdddad044142c6f95","unresolved":false,"context_lines":[{"line_number":117,"context_line":""},{"line_number":118,"context_line":"  In the ML2 driver:"},{"line_number":119,"context_line":""},{"line_number":120,"context_line":"    def update_port_postcommit():"},{"line_number":121,"context_line":"        with acquire_lock(port_id, RESOURCE_TYPE_PORT):"},{"line_number":122,"context_line":"            update_port_in_ovn(port)"},{"line_number":123,"context_line":""},{"line_number":124,"context_line":".. note::"},{"line_number":125,"context_line":"  This lock mechanism is quite similar to the `task queue"}],"source_content_type":"text/x-rst","patch_set":8,"id":"3f4b6375_4910a54e","line":122,"range":{"start_line":120,"start_character":0,"end_line":122,"end_character":36},"in_reply_to":"3f4b6375_e37e5ae2","updated":"2017-10-26 15:55:29.000000000","message":"Hi Han,\n\nQuick question: Does the lock-free design assume that for each operation, only one transaction to OVSDB is used ?\n\nI\u0027m asking because there are some operations where it\u0027s not true (maybe that could be achieved with some refactoring?). For example, the update_router() method [0] does use multiple transactions for its operation. Take a look at other methods like _add_router_ext_gw(), _delete_router_ext_gw(), update_nat_rules() that are invoked as part of the update_router() routine even before the transaction containing the idl.update_lrouter() command is initiated.\n\nI wonder how can we make sure that all these different transactions are linked up together without the use of an external lock. Ideas ?\n\n[0] https://github.com/openstack/networking-ovn/blob/edfd2001c6d2590b1c49821a9664c2abe23e6719/networking_ovn/common/ovn_client.py#L689-L758","commit_id":"a04aa9954122a8a2183465978d687b6dc6757589"},{"author":{"_account_id":11762,"name":"Han Zhou","email":"zhouhan@gmail.com","username":"hanzhou"},"change_message_id":"2613af4ed074bac7ea3afa30d57ad3e0c63033d4","unresolved":false,"context_lines":[{"line_number":117,"context_line":""},{"line_number":118,"context_line":"  In the ML2 driver:"},{"line_number":119,"context_line":""},{"line_number":120,"context_line":"    def update_port_postcommit():"},{"line_number":121,"context_line":"        with acquire_lock(port_id, RESOURCE_TYPE_PORT):"},{"line_number":122,"context_line":"            update_port_in_ovn(port)"},{"line_number":123,"context_line":""},{"line_number":124,"context_line":".. note::"},{"line_number":125,"context_line":"  This lock mechanism is quite similar to the `task queue"}],"source_content_type":"text/x-rst","patch_set":8,"id":"3f4b6375_731ab87d","line":122,"range":{"start_line":120,"start_character":0,"end_line":122,"end_character":36},"in_reply_to":"3f4b6375_e9300aeb","updated":"2017-10-20 22:53:49.000000000","message":"Lucas, thanks for your patience and your carefulness. Let me rephrase the principle in this design, transaction is the mechanism to ensure atomicity. For Neutron DB, it is mysql transaction (happens in precommit); for OVNNB, it is ovsdb transaction (happens in postcommit. The ordering problem comes out because there is no connection between Neutron DB transaction and the ovsdb transation, and the solution for that is revision number as you proposed (this is for ordering of operations on same resource only. For dependent resources it is out of scope of this spec).\n\nSo, in the examples you gave, if some operations are performed to OVNNB in different ovsdb transactions, it may be because they don\u0027t need to be atomic. If we think they should be atomic, then the most reasonable solution is just put them to same transaction, rather than introducing a new lock mechanism, unless there are limitations/flaws in the transaction support. Does this make sense?","commit_id":"a04aa9954122a8a2183465978d687b6dc6757589"},{"author":{"_account_id":11762,"name":"Han Zhou","email":"zhouhan@gmail.com","username":"hanzhou"},"change_message_id":"679926c03e2663d4686dbb250a36a70ca0060a82","unresolved":false,"context_lines":[{"line_number":117,"context_line":""},{"line_number":118,"context_line":"  In the ML2 driver:"},{"line_number":119,"context_line":""},{"line_number":120,"context_line":"    def update_port_postcommit():"},{"line_number":121,"context_line":"        with acquire_lock(port_id, RESOURCE_TYPE_PORT):"},{"line_number":122,"context_line":"            update_port_in_ovn(port)"},{"line_number":123,"context_line":""},{"line_number":124,"context_line":".. note::"},{"line_number":125,"context_line":"  This lock mechanism is quite similar to the `task queue"}],"source_content_type":"text/x-rst","patch_set":8,"id":"3f4b6375_df3e8a89","line":122,"range":{"start_line":120,"start_character":0,"end_line":122,"end_character":36},"in_reply_to":"3f4b6375_f7edf824","updated":"2017-11-06 20:51:10.000000000","message":"Lucas, if you want to go with 1), please make sure the scale issue is taken care of, and I think that\u0027s hard. See the history discussions [1] (the second issue) for the scale problem, and [2] which is the patch that switches to the current mechanism based on the delta (using mutation). If you want to work with a POC I would suggest using port-update with IP addresses as example, which would capture the tricky points.\n\n[1] https://mail.openvswitch.org/pipermail/ovs-dev/2016-August/077083.html\n\n[2] https://review.openstack.org/#/c/351861/18","commit_id":"a04aa9954122a8a2183465978d687b6dc6757589"},{"author":{"_account_id":6773,"name":"Lucas Alvares Gomes","email":"lucasagomes@gmail.com","username":"lucasagomes"},"change_message_id":"bc14ad100746a9a5329986256884834fa36680ac","unresolved":false,"context_lines":[{"line_number":117,"context_line":""},{"line_number":118,"context_line":"  In the ML2 driver:"},{"line_number":119,"context_line":""},{"line_number":120,"context_line":"    def update_port_postcommit():"},{"line_number":121,"context_line":"        with acquire_lock(port_id, RESOURCE_TYPE_PORT):"},{"line_number":122,"context_line":"            update_port_in_ovn(port)"},{"line_number":123,"context_line":""},{"line_number":124,"context_line":".. note::"},{"line_number":125,"context_line":"  This lock mechanism is quite similar to the `task queue"}],"source_content_type":"text/x-rst","patch_set":8,"id":"3f4b6375_e37e5ae2","line":122,"range":{"start_line":120,"start_character":0,"end_line":122,"end_character":36},"in_reply_to":"3f4b6375_fc584631","updated":"2017-10-26 09:52:58.000000000","message":"Hi Han,\n\nThat\u0027s some great stuff in here, thanks a lot!\n\n\u003e There should be no problem, because the version comparison and the update in OVN are happened in the same OVN transaction. If a 3rd update happens between the compare and commit, the OVN transaction will automatically retry (with the help of \"verify\" to detect the conflict) and the version comparison happening in retry will find out that the update is not needed any more, thus decide not to update OVN (this is different from a transaction abort).\n\nI think I\u0027m now understanding better your lock-less approach now. There are still some bits that I will need some more investigation to see how it works, specially the parts like \"this is different from a transaction abort\" and also how the code will look like. I\u0027m searching for some examples and I can see some use of verify() at https://github.com/openstack/networking-ovn/blob/master/networking_ovn/ovsdb/commands.py already.\n\nTaking the SetLSwitchPortCommand as an example, in my head (without much investigation) the verification will look like the code below ? \n\n class SetLSwitchPortCommand(command.BaseCommand):\n\n     def __init__(self, ..., revision_number):\n         ....\n         self.revision_number \u003d revision_number\n\n     def run_idl(self, txn):\n        try:\n            port \u003d idl.get_port(...)\n        except idlutils.RowNotFound:\n            ...\n\n        # Check the revision number for consistency\n        verify(\u0027external_ids\u0027)\n        ext_ids \u003d getattr(port, \u0027external_ids\u0027, {})\n        if ext_ids[\u0027neutron:revision_number\u0027] \u003e self.revision_number:\n            # How to abort the transaction ?!\n            raise \n        ext_ids[\u0027neutron:revision_number\u0027] \u003d self.revision_number\n        port.external_ids \u003d ext_ids\n\nI guess what I\u0027m missing the most is where the comparison happens (e.g V2 \u003e V1) and how skip the update.\n\nBut other than that, and this is detail/investigation I think we are on the same page and I will work on a PoC for the lock-less approach.\n\nContinuing replying to the previous comment...\n\n\u003e So far, problems are solved, and the only extra lock needed is a global lock for check \u0026 sync job. (the lock can be either based on Neutron DB or ovsdb)\n\nThat\u0027s right, I have used a ovsdb named lock for this in my PoC: https://review.openstack.org/#/c/503705/3/networking_ovn/common/maintenance.py@100\n\n\u003e BTW, I would suggest that we ensure the correctness in the simplest way first, and then do the optimization incrementally.\n\nHonestly, I feel like the optimization is simpler to implement than starting without it. It\u0027s just a new table to the database and a JOIN operation like we have in here: https://review.openstack.org/#/c/503705/3/networking_ovn/db/maintenance.py@27\n\nAt least in my view, it\u0027s not a big deal and the performance improvement can be quite significant depending on the number of resources we have (I want to do a scale/stress test on this approach after having a PoC of it, so it will probably count).\n\nSo that\u0027s all good, thanks again for laying this new approach down. I will do some investigation about the verify() and as I said before, I think we are on the same page now.\n\nI will also re-work this spec to reflect the changes needed after I understand them all.","commit_id":"a04aa9954122a8a2183465978d687b6dc6757589"},{"author":{"_account_id":6773,"name":"Lucas Alvares Gomes","email":"lucasagomes@gmail.com","username":"lucasagomes"},"change_message_id":"a08358add566243bd3889f64f81bfa1b38428ef1","unresolved":false,"context_lines":[{"line_number":117,"context_line":""},{"line_number":118,"context_line":"  In the ML2 driver:"},{"line_number":119,"context_line":""},{"line_number":120,"context_line":"    def update_port_postcommit():"},{"line_number":121,"context_line":"        with acquire_lock(port_id, RESOURCE_TYPE_PORT):"},{"line_number":122,"context_line":"            update_port_in_ovn(port)"},{"line_number":123,"context_line":""},{"line_number":124,"context_line":".. note::"},{"line_number":125,"context_line":"  This lock mechanism is quite similar to the `task queue"}],"source_content_type":"text/x-rst","patch_set":8,"id":"3f4b6375_7ea091a6","line":122,"range":{"start_line":120,"start_character":0,"end_line":122,"end_character":36},"in_reply_to":"5f4e5783_cab92c79","updated":"2017-10-18 13:26:55.000000000","message":"Hi Han,\n\nThanks for pointing this out, yes I will push a new patch-set for this. So, in the PoC code I still use the create_*_precommit() to create the initial revision [0] of the resource and make sure we have it logged within the same transaction that neutron used to commit it. We just don\u0027t care about the ordering anymore because updates with old revision number can now be dropped instead in case the object already has a newer version.\n\nThere\u0027s also the problem with the L3 driver which does not contain the {pre, pos}_commit() methods for its resources. For that, I\u0027m planning on using the Neutron callbacks mechanism [1] which allows me to create such methods.\n\nWhat do you think ? I will update the spec a bit more\n\n[0] https://review.openstack.org/#/c/503704/3/networking_ovn/ml2/mech_driver.py@241\n\n[1] https://docs.openstack.org/neutron-lib/latest/contributor/callbacks.html","commit_id":"a04aa9954122a8a2183465978d687b6dc6757589"},{"author":{"_account_id":11762,"name":"Han Zhou","email":"zhouhan@gmail.com","username":"hanzhou"},"change_message_id":"509ccab1930bf28f36084f58b0f308d073202c17","unresolved":false,"context_lines":[{"line_number":124,"context_line":".. note::"},{"line_number":125,"context_line":"  This lock mechanism is quite similar to the `task queue"},{"line_number":126,"context_line":"  \u003chttps://review.openstack.org/#/c/358447\u003e`_ idea that was proposed"},{"line_number":127,"context_line":"  in the past."},{"line_number":128,"context_line":""},{"line_number":129,"context_line":"While the lock prevents multiple updates to the same resource from"},{"line_number":130,"context_line":"happening at the same time, it does not solve the problem of ordering"}],"source_content_type":"text/x-rst","patch_set":8,"id":"5f4e5783_ad972a48","line":127,"updated":"2017-10-17 18:24:24.000000000","message":"This may be not true any more. If there is no register_lock_for_resource() in precommit(), then this lock mechanism is different from the \"task queue\" idea. The \"task queue\" idea was to use DB to preserve ordering of operations, including indirect dependencies (e.g. [1] [2]). \n\n[1] https://bugs.launchpad.net/networking-ovn/+bug/1611852\n\n[2] https://review.openstack.org/#/c/362494/1","commit_id":"a04aa9954122a8a2183465978d687b6dc6757589"},{"author":{"_account_id":6773,"name":"Lucas Alvares Gomes","email":"lucasagomes@gmail.com","username":"lucasagomes"},"change_message_id":"a08358add566243bd3889f64f81bfa1b38428ef1","unresolved":false,"context_lines":[{"line_number":124,"context_line":".. note::"},{"line_number":125,"context_line":"  This lock mechanism is quite similar to the `task queue"},{"line_number":126,"context_line":"  \u003chttps://review.openstack.org/#/c/358447\u003e`_ idea that was proposed"},{"line_number":127,"context_line":"  in the past."},{"line_number":128,"context_line":""},{"line_number":129,"context_line":"While the lock prevents multiple updates to the same resource from"},{"line_number":130,"context_line":"happening at the same time, it does not solve the problem of ordering"}],"source_content_type":"text/x-rst","patch_set":8,"id":"3f4b6375_3e833907","line":127,"in_reply_to":"5f4e5783_ad972a48","updated":"2017-10-18 13:26:55.000000000","message":"Yeah, we still use the precommit() for registering things (as in comment at L122) but, we don\u0027t rely on ordering things anymore (since we can simple drop old updates now instead of computing them).","commit_id":"a04aa9954122a8a2183465978d687b6dc6757589"},{"author":{"_account_id":11762,"name":"Han Zhou","email":"zhouhan@gmail.com","username":"hanzhou"},"change_message_id":"509ccab1930bf28f36084f58b0f308d073202c17","unresolved":false,"context_lines":[{"line_number":131,"context_line":"these updates. This problem will be handlded later on by the `revision"},{"line_number":132,"context_line":"numbers \u003csolution_2_\u003e`_. It\u0027s also important to note that the lock is"},{"line_number":133,"context_line":"not a global lock, its scope is limited to one resource in particular"},{"line_number":134,"context_line":"and concurrency will still be allowed for everything else."},{"line_number":135,"context_line":""},{"line_number":136,"context_line":"Things to keep in mind:"},{"line_number":137,"context_line":""}],"source_content_type":"text/x-rst","patch_set":8,"id":"5f4e5783_6d47f2bd","line":134,"range":{"start_line":134,"start_character":4,"end_line":134,"end_character":57},"updated":"2017-10-17 18:24:24.000000000","message":"As mentioned in the links of my previous comment, different resources may have indirect dependencies. Do you have an idea for that, too? Revision number seems won\u0027t help in that case either.","commit_id":"a04aa9954122a8a2183465978d687b6dc6757589"},{"author":{"_account_id":6773,"name":"Lucas Alvares Gomes","email":"lucasagomes@gmail.com","username":"lucasagomes"},"change_message_id":"a8f1b7a36857bbe3d6397f74c6f705eb89b5951e","unresolved":false,"context_lines":[{"line_number":131,"context_line":"these updates. This problem will be handlded later on by the `revision"},{"line_number":132,"context_line":"numbers \u003csolution_2_\u003e`_. It\u0027s also important to note that the lock is"},{"line_number":133,"context_line":"not a global lock, its scope is limited to one resource in particular"},{"line_number":134,"context_line":"and concurrency will still be allowed for everything else."},{"line_number":135,"context_line":""},{"line_number":136,"context_line":"Things to keep in mind:"},{"line_number":137,"context_line":""}],"source_content_type":"text/x-rst","patch_set":8,"id":"3f4b6375_c2b66ada","line":134,"range":{"start_line":134,"start_character":4,"end_line":134,"end_character":57},"in_reply_to":"3f4b6375_081a6b7c","updated":"2017-10-19 08:58:40.000000000","message":"Brillant idea regarding the counter. Yeah definitely we could minimize the redundancy by having a counter and delete the address only when it gets to 0.\n\nOh right, I didn\u0027t know. I basically had skimmed that code to see what it was doing and looked a bit cost the fact that it had to convert all elements from that set from \"datum\" to python, then append the new element, then convert everything back to json [0]. But yeah, I would need to look to it in more details. Thanks for the context about this btw.\n\n(It\u0027s off-topic as well, sorry for bringing it up).\n\n[0] https://github.com/openvswitch/ovs/blob/f73d562fc0ee3ff43f65cc418f213a79a727cb19/python/ovs/db/idl.py#L1330-L1338","commit_id":"a04aa9954122a8a2183465978d687b6dc6757589"},{"author":{"_account_id":11762,"name":"Han Zhou","email":"zhouhan@gmail.com","username":"hanzhou"},"change_message_id":"2135fc06b055a398c8257f454f0f25c7853d74a3","unresolved":false,"context_lines":[{"line_number":131,"context_line":"these updates. This problem will be handlded later on by the `revision"},{"line_number":132,"context_line":"numbers \u003csolution_2_\u003e`_. It\u0027s also important to note that the lock is"},{"line_number":133,"context_line":"not a global lock, its scope is limited to one resource in particular"},{"line_number":134,"context_line":"and concurrency will still be allowed for everything else."},{"line_number":135,"context_line":""},{"line_number":136,"context_line":"Things to keep in mind:"},{"line_number":137,"context_line":""}],"source_content_type":"text/x-rst","patch_set":8,"id":"3f4b6375_081a6b7c","line":134,"range":{"start_line":134,"start_character":4,"end_line":134,"end_character":57},"in_reply_to":"3f4b6375_6be1f2d0","updated":"2017-10-19 04:44:15.000000000","message":"I agree the address set update race condition is a separate problem which can be taken care by a separate spec. In fact the probability should be lower than the generic race condition on the same resource.\n\nSince we are already discussing, your proposal of new set of Neutron tables to manage the address sets should be able to solve the problem, but it seems too heavy to me - there are redundant data and extra cost. An efficient solution in that case may be just allow duplicated addresses in one set, or some way to maintain a counter. We don\u0027t really care which IP belongs to which port, since they are completely equal.\n\nRegarding the cost you mentioned in [2][3], I didn\u0027t get it. It was in fact an optimization earlier, which changed the original implementation of full address set update to OVN-NB to just the delta with the help of a new ovsdb feature: mutation on sets. Without this change, the cost of each port operation would increase linearly when SG size increases.","commit_id":"a04aa9954122a8a2183465978d687b6dc6757589"},{"author":{"_account_id":6773,"name":"Lucas Alvares Gomes","email":"lucasagomes@gmail.com","username":"lucasagomes"},"change_message_id":"a08358add566243bd3889f64f81bfa1b38428ef1","unresolved":false,"context_lines":[{"line_number":131,"context_line":"these updates. This problem will be handlded later on by the `revision"},{"line_number":132,"context_line":"numbers \u003csolution_2_\u003e`_. It\u0027s also important to note that the lock is"},{"line_number":133,"context_line":"not a global lock, its scope is limited to one resource in particular"},{"line_number":134,"context_line":"and concurrency will still be allowed for everything else."},{"line_number":135,"context_line":""},{"line_number":136,"context_line":"Things to keep in mind:"},{"line_number":137,"context_line":""}],"source_content_type":"text/x-rst","patch_set":8,"id":"3f4b6375_6be1f2d0","line":134,"range":{"start_line":134,"start_character":4,"end_line":134,"end_character":57},"in_reply_to":"5f4e5783_6d47f2bd","updated":"2017-10-18 13:26:55.000000000","message":"Right, so, as-is this spec does not offer any solution for that problem, since address sets is an internal resource in OVN and not present in Neutron the revision numbers, as you pointed out, won\u0027t work for it.\n\nI have looked at the bug [0] and also the PoC code [1] and, while it\u0027s grand, I still wonder if we could implement address sets in a way that wouldn\u0027t require that amount of computation every time we touch a port resource (just like journaling it could impact on the overall performance).\n\nThe implementation of address sets right now seems to be \"too simple\": A table in the NB database with a nested list of addresses in a column. That would be fine *if* the resource wasn\u0027t shared.\n\nThe nested list doesn\u0027t really help preventing a port from removing elements that was added (and is used) by another port because it doesn\u0027t have any indicators about which port that element belongs to. Causing the problem that was reported at [0].\n\nOne possible solution would be to re-implement address sets differently. We could use the neutron database and two new tables, one to represent the \"address_set\" resource and another one to represent the list of \"addresses\". The advantage here is that, for each address element I could have a column to indicate which port that element belongs to.\n\nFor example, two new tables would be added:\n\novn_address_sets:\n- id\n- name\n\novn_addresses:\n- address_set_id\n- port_id\n- address\n\nThe \"address_set_id\" and \"port_id\" would be a composite key, that way, following the example of [0], when the port delete occurs only the address which belongs to that port (via port_id) will be removed from the table \"ovn_addresses\". The same address but with a different port_id that was added prior to the deletion won\u0027t be removed.\n\nNow we could simple transform this data into a set of addresses and swap the list in OVNDB, we don\u0027t need to keep appending/deleting individual elements from the list in OVNDB anymore which btw, seems to be costy in the python module of OVS because it unpacks the whole list and convert each element before doing anything with it, see [2][3].\n\nWhat do you think of this approach ?\n\n[0] https://bugs.launchpad.net/networking-ovn/+bug/1611852\n\n[1] https://review.openstack.org/#/c/362494\n\n[2] https://github.com/openvswitch/ovs/blob/f73d562fc0ee3ff43f65cc418f213a79a727cb19/python/ovs/db/idl.py#L851-L863\n\n[3] https://github.com/openvswitch/ovs/blob/f73d562fc0ee3ff43f65cc418f213a79a727cb19/python/ovs/db/idl.py#L1319-L1344","commit_id":"a04aa9954122a8a2183465978d687b6dc6757589"},{"author":{"_account_id":11762,"name":"Han Zhou","email":"zhouhan@gmail.com","username":"hanzhou"},"change_message_id":"2135fc06b055a398c8257f454f0f25c7853d74a3","unresolved":false,"context_lines":[{"line_number":152,"context_line":"  retain the locks for a certain time for troubleshooting purposes or"},{"line_number":153,"context_line":"  delete the lock entry upon it\u0027s release."},{"line_number":154,"context_line":""},{"line_number":155,"context_line":".. _solution_2:"},{"line_number":156,"context_line":""},{"line_number":157,"context_line":"Solution 2: Backend failures"},{"line_number":158,"context_line":"----------------------------"}],"source_content_type":"text/x-rst","patch_set":8,"id":"3f4b6375_eb514910","line":155,"updated":"2017-10-19 04:44:15.000000000","message":"Although we do have 2 problems to be addressed in this spec, the current solutions (1 and 2) are not solving them individually. It seems now you are proposing a combined solution for both problems. (and I am suggesting that solution 1 can be removed). So I think we\u0027d better re-organize the document to avoid confusion. What do you think?","commit_id":"a04aa9954122a8a2183465978d687b6dc6757589"},{"author":{"_account_id":6773,"name":"Lucas Alvares Gomes","email":"lucasagomes@gmail.com","username":"lucasagomes"},"change_message_id":"a8f1b7a36857bbe3d6397f74c6f705eb89b5951e","unresolved":false,"context_lines":[{"line_number":152,"context_line":"  retain the locks for a certain time for troubleshooting purposes or"},{"line_number":153,"context_line":"  delete the lock entry upon it\u0027s release."},{"line_number":154,"context_line":""},{"line_number":155,"context_line":".. _solution_2:"},{"line_number":156,"context_line":""},{"line_number":157,"context_line":"Solution 2: Backend failures"},{"line_number":158,"context_line":"----------------------------"}],"source_content_type":"text/x-rst","patch_set":8,"id":"3f4b6375_c239ca41","line":155,"in_reply_to":"3f4b6375_eb514910","updated":"2017-10-19 08:58:40.000000000","message":"That\u0027s right, the \"full solution\" is a combination of the two. Which, as you pointed out, can/is causing some confusion.\n\nI will wait for your answer regarding the lock (as per comment @ L 122) and then, whatever we decide, I will try to reorganize this document a bit better and also include the missing details that you have pointed out on the previous comments as well.\n\nThanks a lot!","commit_id":"a04aa9954122a8a2183465978d687b6dc6757589"},{"author":{"_account_id":6854,"name":"YAMAMOTO Takashi","email":"yamamoto@midokura.com","username":"yamamoto"},"change_message_id":"7f12ca37d9709824ecb6f3fb05ecfab432c096e7","unresolved":false,"context_lines":[{"line_number":110,"context_line":""},{"line_number":111,"context_line":""},{"line_number":112,"context_line":"#1 - Store the revision_number referent to a change in OVNDB"},{"line_number":113,"context_line":"------------------------------------------------------------"},{"line_number":114,"context_line":""},{"line_number":115,"context_line":"To be able to compare the version of the resource in Neutron against"},{"line_number":116,"context_line":"the version in OVN we first need to know which version the OVN resource"}],"source_content_type":"text/x-rst","patch_set":10,"id":"ff82abbf_fb067a1c","line":113,"updated":"2017-11-27 03:10:41.000000000","message":"do you have any plan for resources which don\u0027t have revision_number right now?\neg. many of neutron-xxxaas resources.","commit_id":"0b7b8a8aa6576f97dff94fa2bd7fe2c89bba5d71"},{"author":{"_account_id":6854,"name":"YAMAMOTO Takashi","email":"yamamoto@midokura.com","username":"yamamoto"},"change_message_id":"dc9797748b2d040e464ed4fe030f310e38252089","unresolved":false,"context_lines":[{"line_number":110,"context_line":""},{"line_number":111,"context_line":""},{"line_number":112,"context_line":"#1 - Store the revision_number referent to a change in OVNDB"},{"line_number":113,"context_line":"------------------------------------------------------------"},{"line_number":114,"context_line":""},{"line_number":115,"context_line":"To be able to compare the version of the resource in Neutron against"},{"line_number":116,"context_line":"the version in OVN we first need to know which version the OVN resource"}],"source_content_type":"text/x-rst","patch_set":10,"id":"ff82abbf_f73942d8","line":113,"in_reply_to":"ff82abbf_0b573ecc","updated":"2017-11-30 14:46:32.000000000","message":"ok, i understood the scope of this. thank you.","commit_id":"0b7b8a8aa6576f97dff94fa2bd7fe2c89bba5d71"},{"author":{"_account_id":6773,"name":"Lucas Alvares Gomes","email":"lucasagomes@gmail.com","username":"lucasagomes"},"change_message_id":"0c0574ffe7fa67144d11c5570fe717c18b42e518","unresolved":false,"context_lines":[{"line_number":110,"context_line":""},{"line_number":111,"context_line":""},{"line_number":112,"context_line":"#1 - Store the revision_number referent to a change in OVNDB"},{"line_number":113,"context_line":"------------------------------------------------------------"},{"line_number":114,"context_line":""},{"line_number":115,"context_line":"To be able to compare the version of the resource in Neutron against"},{"line_number":116,"context_line":"the version in OVN we first need to know which version the OVN resource"}],"source_content_type":"text/x-rst","patch_set":10,"id":"ff82abbf_0b573ecc","line":113,"in_reply_to":"ff82abbf_fb067a1c","updated":"2017-11-27 16:23:13.000000000","message":"This spec only covers the resources that the networking-ovn project relies on such as: Networks, Ports, Routers, Floating IPs, SecurityGroups etc... Basically everything that we current care when syncing Neutron/OVN using the ovn_db_sync.py script [0].\n\nI\u0027m not too familiar with the neutron-*aas plugins so it\u0027s hard to me to have an opinion right now whether this solution with the revision_numbers does or does not work for them. Or even if adding revision numbers for those missing resources is something that we can do. I would prefer to leave it to future work if possible, would that be OK for you ?\n\n[0] https://github.com/openstack/networking-ovn/blob/763f4c3aa1f031c47222d7c80992b5453c7ebc8d/networking_ovn/ovn_db_sync.py","commit_id":"0b7b8a8aa6576f97dff94fa2bd7fe2c89bba5d71"},{"author":{"_account_id":10237,"name":"Numan Siddique","email":"nusiddiq@redhat.com","username":"numansiddique"},"change_message_id":"3037eed4c19d32c17f81aa3f54d32791ecf580f9","unresolved":false,"context_lines":[{"line_number":163,"context_line":""},{"line_number":164,"context_line":"2. Compare the ``revision_number`` from the update against what is"},{"line_number":165,"context_line":"presently stored in OVNDB. If the version in OVNDB is already higher"},{"line_number":166,"context_line":"than the version in the update, abort the transaction."},{"line_number":167,"context_line":""},{"line_number":168,"context_line":"So basically this new command is responsible for guarding the OVN resource"},{"line_number":169,"context_line":"by not allowing old changes to be applied on top of new ones. Here\u0027s a"}],"source_content_type":"text/x-rst","patch_set":10,"id":"df87a7cf_ae51f9da","line":166,"updated":"2017-12-04 15:19:17.000000000","message":"A small question  - When the transaction is aborted, what will be the HTTP return code sent to the client ?\n\nFrom a user perspective, it should be successful right ?","commit_id":"0b7b8a8aa6576f97dff94fa2bd7fe2c89bba5d71"},{"author":{"_account_id":6773,"name":"Lucas Alvares Gomes","email":"lucasagomes@gmail.com","username":"lucasagomes"},"change_message_id":"bbfddf4d096a4aecf9e5186ee82bab45d9747daa","unresolved":false,"context_lines":[{"line_number":163,"context_line":""},{"line_number":164,"context_line":"2. Compare the ``revision_number`` from the update against what is"},{"line_number":165,"context_line":"presently stored in OVNDB. If the version in OVNDB is already higher"},{"line_number":166,"context_line":"than the version in the update, abort the transaction."},{"line_number":167,"context_line":""},{"line_number":168,"context_line":"So basically this new command is responsible for guarding the OVN resource"},{"line_number":169,"context_line":"by not allowing old changes to be applied on top of new ones. Here\u0027s a"}],"source_content_type":"text/x-rst","patch_set":10,"id":"df87a7cf_fa349b4c","line":166,"in_reply_to":"df87a7cf_ae51f9da","updated":"2017-12-04 17:24:35.000000000","message":"That\u0027s right, he will see it as HTTP 200 and I believe that the code is not misleading because, the fact that the OVN transaction was aborted means that the change was successfully committed in the neutron database. Networking-ovn just took care of updating the backend database in the best way it could to avoid older changes from overwritting new ones.","commit_id":"0b7b8a8aa6576f97dff94fa2bd7fe2c89bba5d71"},{"author":{"_account_id":6773,"name":"Lucas Alvares Gomes","email":"lucasagomes@gmail.com","username":"lucasagomes"},"change_message_id":"7fb75f79751169ff1a51005c806f407624854592","unresolved":false,"context_lines":[{"line_number":185,"context_line":"There\u0027s a bit more for the above to work with the current networking-ovn"},{"line_number":186,"context_line":"code, basically we need to tidy up the code to do two more things."},{"line_number":187,"context_line":""},{"line_number":188,"context_line":"1. Consolidate changes to a resource in a single transaction."},{"line_number":189,"context_line":""},{"line_number":190,"context_line":"This is important regardless of this spec, having all changes to a"},{"line_number":191,"context_line":"resource done in a single transaction minimizes the risk of having"}],"source_content_type":"text/x-rst","patch_set":10,"id":"ff82abbf_a8f754d3","line":188,"updated":"2017-11-24 16:00:59.000000000","message":"Apparently there\u0027s a strange indentation problem in this line [0]\n\n[0] http://logs.openstack.org/34/490834/10/check/build-openstack-sphinx-docs/7ee2be9/html/contributor/design/database_consistency.html#ensure-correctness-when-updating-ovn","commit_id":"0b7b8a8aa6576f97dff94fa2bd7fe2c89bba5d71"},{"author":{"_account_id":8788,"name":"Miguel Angel Ajo","email":"mangelajo@redhat.com","username":"mangelajo"},"change_message_id":"ac4eecf2435cebc00fb2546b23021c0b79098183","unresolved":false,"context_lines":[{"line_number":212,"context_line":"using the *original* object for it and instead we should create the"},{"line_number":213,"context_line":"delta based on the *current* version of the NeutronDB against the data"},{"line_number":214,"context_line":"stored in the OVNDB to be able to detect the real differences between"},{"line_number":215,"context_line":"the two databases."},{"line_number":216,"context_line":""},{"line_number":217,"context_line":"So in summary, to guarantee the correctness of the updates this documents"},{"line_number":218,"context_line":"proposes to:"}],"source_content_type":"text/x-rst","patch_set":10,"id":"ff82abbf_286a64ed","line":215,"range":{"start_line":215,"start_character":3,"end_line":215,"end_character":4},"updated":"2017-11-24 16:34:58.000000000","message":"very good point, I was about to comment on that, but I see you nailed it before :D","commit_id":"0b7b8a8aa6576f97dff94fa2bd7fe2c89bba5d71"},{"author":{"_account_id":6773,"name":"Lucas Alvares Gomes","email":"lucasagomes@gmail.com","username":"lucasagomes"},"change_message_id":"7fb75f79751169ff1a51005c806f407624854592","unresolved":false,"context_lines":[{"line_number":264,"context_line":"                            troubleshooting purposes"},{"line_number":265,"context_line":"\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d  \u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d  \u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d\u003d"},{"line_number":266,"context_line":""},{"line_number":267,"context_line":"For the different actions: Create, update and delete; these table will be"},{"line_number":268,"context_line":"used as:"},{"line_number":269,"context_line":""},{"line_number":270,"context_line":""}],"source_content_type":"text/x-rst","patch_set":10,"id":"ff82abbf_8875b066","line":267,"range":{"start_line":267,"start_character":54,"end_line":267,"end_character":59},"updated":"2017-11-24 16:00:59.000000000","message":"this","commit_id":"0b7b8a8aa6576f97dff94fa2bd7fe2c89bba5d71"},{"author":{"_account_id":6773,"name":"Lucas Alvares Gomes","email":"lucasagomes@gmail.com","username":"lucasagomes"},"change_message_id":"7fb75f79751169ff1a51005c806f407624854592","unresolved":false,"context_lines":[{"line_number":336,"context_line":"by random backend failures."},{"line_number":337,"context_line":""},{"line_number":338,"context_line":".. note::"},{"line_number":339,"context_line":"   There\u0027s not lock linking both database updates in the postcommit()"},{"line_number":340,"context_line":"   methods. So, it\u0027s true that the method bumping the revision_number"},{"line_number":341,"context_line":"   column in the new table in NeutronDB could still race but, that"},{"line_number":342,"context_line":"   should be fine because this table acts like a cache and the real"}],"source_content_type":"text/x-rst","patch_set":10,"id":"ff82abbf_a872f479","line":339,"range":{"start_line":339,"start_character":11,"end_line":339,"end_character":14},"updated":"2017-11-24 16:00:59.000000000","message":"no","commit_id":"0b7b8a8aa6576f97dff94fa2bd7fe2c89bba5d71"},{"author":{"_account_id":6854,"name":"YAMAMOTO Takashi","email":"yamamoto@midokura.com","username":"yamamoto"},"change_message_id":"7f12ca37d9709824ecb6f3fb05ecfab432c096e7","unresolved":false,"context_lines":[{"line_number":379,"context_line":""},{"line_number":380,"context_line":"Some things to keep in mind about this approach:"},{"line_number":381,"context_line":""},{"line_number":382,"context_line":"* The code can get quite complex as this approach is not only about"},{"line_number":383,"context_line":"  applying the changes to the SDN backend asynchronously. The dependencies"},{"line_number":384,"context_line":"  between each resource as well as their operations also needs to be"},{"line_number":385,"context_line":"  computed. For example, before attempting to create a router port the"},{"line_number":386,"context_line":"  router that this port belongs to needs to be created. Or, before"},{"line_number":387,"context_line":"  attempting to delete a network all the dependent resources on it"},{"line_number":388,"context_line":"  (subnets, ports, etc...) needs to be processed first."},{"line_number":389,"context_line":""},{"line_number":390,"context_line":"* The number of journal threads running can cause problems. In my tests"},{"line_number":391,"context_line":"  I had three controllers, each one with 24 CPU cores (Intel Xeon E5-2620"}],"source_content_type":"text/x-rst","patch_set":10,"id":"ff82abbf_7baa6a71","line":388,"range":{"start_line":382,"start_character":0,"end_line":388,"end_character":55},"updated":"2017-11-27 03:10:41.000000000","message":"the proposed solution also needs something similar, right?  otherwise, can you explain why it isn\u0027t necessary?","commit_id":"0b7b8a8aa6576f97dff94fa2bd7fe2c89bba5d71"},{"author":{"_account_id":6854,"name":"YAMAMOTO Takashi","email":"yamamoto@midokura.com","username":"yamamoto"},"change_message_id":"dc9797748b2d040e464ed4fe030f310e38252089","unresolved":false,"context_lines":[{"line_number":379,"context_line":""},{"line_number":380,"context_line":"Some things to keep in mind about this approach:"},{"line_number":381,"context_line":""},{"line_number":382,"context_line":"* The code can get quite complex as this approach is not only about"},{"line_number":383,"context_line":"  applying the changes to the SDN backend asynchronously. The dependencies"},{"line_number":384,"context_line":"  between each resource as well as their operations also needs to be"},{"line_number":385,"context_line":"  computed. For example, before attempting to create a router port the"},{"line_number":386,"context_line":"  router that this port belongs to needs to be created. Or, before"},{"line_number":387,"context_line":"  attempting to delete a network all the dependent resources on it"},{"line_number":388,"context_line":"  (subnets, ports, etc...) needs to be processed first."},{"line_number":389,"context_line":""},{"line_number":390,"context_line":"* The number of journal threads running can cause problems. In my tests"},{"line_number":391,"context_line":"  I had three controllers, each one with 24 CPU cores (Intel Xeon E5-2620"}],"source_content_type":"text/x-rst","patch_set":10,"id":"ff82abbf_77dbf220","line":388,"range":{"start_line":382,"start_character":0,"end_line":388,"end_character":55},"in_reply_to":"ff82abbf_6e537818","updated":"2017-11-30 14:46:32.000000000","message":"even with synchronous design, depending on the backend implementation, you can get \"resource still in-use\" type of issues with multiple neutron api workers and/or multiple neutron api servers. in midonet, we\u0027ve seen such issues in real production environments.\n(i haven\u0027t looked your AddressSet example closely yet but i guess it\u0027s something similar.)\nanyway it\u0027s fine for me to move forward as far as you folks know it can be a real problem.","commit_id":"0b7b8a8aa6576f97dff94fa2bd7fe2c89bba5d71"},{"author":{"_account_id":6773,"name":"Lucas Alvares Gomes","email":"lucasagomes@gmail.com","username":"lucasagomes"},"change_message_id":"0c0574ffe7fa67144d11c5570fe717c18b42e518","unresolved":false,"context_lines":[{"line_number":379,"context_line":""},{"line_number":380,"context_line":"Some things to keep in mind about this approach:"},{"line_number":381,"context_line":""},{"line_number":382,"context_line":"* The code can get quite complex as this approach is not only about"},{"line_number":383,"context_line":"  applying the changes to the SDN backend asynchronously. The dependencies"},{"line_number":384,"context_line":"  between each resource as well as their operations also needs to be"},{"line_number":385,"context_line":"  computed. For example, before attempting to create a router port the"},{"line_number":386,"context_line":"  router that this port belongs to needs to be created. Or, before"},{"line_number":387,"context_line":"  attempting to delete a network all the dependent resources on it"},{"line_number":388,"context_line":"  (subnets, ports, etc...) needs to be processed first."},{"line_number":389,"context_line":""},{"line_number":390,"context_line":"* The number of journal threads running can cause problems. In my tests"},{"line_number":391,"context_line":"  I had three controllers, each one with 24 CPU cores (Intel Xeon E5-2620"}],"source_content_type":"text/x-rst","patch_set":10,"id":"ff82abbf_6e537818","line":388,"range":{"start_line":382,"start_character":0,"end_line":388,"end_character":55},"in_reply_to":"ff82abbf_7baa6a71","updated":"2017-11-27 16:23:13.000000000","message":"The paragraph above is due to the fact that the journaling approach is completely asynchronous, so, when a request is handled from the API it won\u0027t create, update or delete the resource from the SDN backend straight away. Instead, all the code handling the requests from the API will just log these actions into a separated table which will then be consumed by X numbers of journal threads running in the background.\n\nFor example, when a journal entry to create a port is appended to the journal log it will also include the IDs of the resources that the port depends on [0] (e.g network_id, subnet_id(s), etc...). So, when a journal thread picks that port entry from the journal log it should look at those IDs and see if they are all marked as \"completed\" in the log before it goes ahead and create the port. Otherwise it will put that entry state back to \"pending\" state and skip its creation for awhile.\n\nIn our case we are keeping the synchronous behavior just like we have it now. So, when the user make a request to the API to create the network, it will be created by the time the request returns. Then he/she will create the port that belongs in that network and so on...\n\nThat all said, we do have a case of indirect dependency to solve. As pointed out by Han Zhou in the previous patch-sets [1] (see line 134) the AddressSet resource in OVN is shared between different ports and they could race. I believe this can solved by modifying the DB design a little, there\u0027s a proposed solution inline at [1] (also line 134). In any case, we agreed that it would be better solved as a separated work from this spec.\n\n[0] https://github.com/openstack/networking-odl/blob/7a3c5fee7deb01d9237f5d1cc43a17931999af02/networking_odl/journal/dependency_validations.py#L63-L69\n\n[1] https://review.openstack.org/#/c/490834/8/doc/source/contributor/design/database_consistency.rst@134","commit_id":"0b7b8a8aa6576f97dff94fa2bd7fe2c89bba5d71"},{"author":{"_account_id":8788,"name":"Miguel Angel Ajo","email":"mangelajo@redhat.com","username":"mangelajo"},"change_message_id":"ac4eecf2435cebc00fb2546b23021c0b79098183","unresolved":false,"context_lines":[{"line_number":438,"context_line":"  \u003chttps://github.com/openstack/browbeat\u003e`_ which is basically orchestrate"},{"line_number":439,"context_line":"  `Openstack Rally \u003chttps://github.com/openstack/rally\u003e`_ and monitor the"},{"line_number":440,"context_line":"  machine\u0027s usage of resources."},{"line_number":441,"context_line":""}],"source_content_type":"text/x-rst","patch_set":10,"id":"ff82abbf_c843b86b","line":441,"updated":"2017-11-24 16:34:58.000000000","message":"if you need to respin we can  kill this line","commit_id":"0b7b8a8aa6576f97dff94fa2bd7fe2c89bba5d71"}]}
