)]}' {"/PATCHSET_LEVEL":[{"author":{"_account_id":7233,"name":"Matthew Oliver","email":"matt@oliver.net.au","username":"mattoliverau"},"change_message_id":"134e763802ba0f1dc42a682bab96db76c81d76cc","unresolved":false,"context_lines":[],"source_content_type":"","patch_set":1,"id":"1ce901bb_b134cbf7","updated":"2023-11-16 03:12:03.000000000","message":"Still need to write tests","commit_id":"1882301c67b6b89224a29d251c0f343c1e566355"},{"author":{"_account_id":7233,"name":"Matthew Oliver","email":"matt@oliver.net.au","username":"mattoliverau"},"change_message_id":"2f8a0cb7bd05f898e386b7485d1daf533d8fae86","unresolved":false,"context_lines":[],"source_content_type":"","patch_set":2,"id":"a41fc4b9_f0c5061d","updated":"2024-12-18 03:58:15.000000000","message":"Going back to the good ole 700 million big container that Sam created me back in 2016 (yes I still have it laying around for shard testing)\n\nHitting the index for deleted only (because there is usually much less of those) is much quicker. First hitting the index but still counting all is slow:\n```\n(venv) [matt@workski ~/.../2/swift (review/matthew_oliver/first_sync_and_big_check *%)]$ time sqlite3 bigdb-700000000/big.db -line \u0027select count(name) from object\u0027\ncount(name) \u003d 700000000\n\nreal\t0m41.216s\nuser\t0m20.587s\nsys\t0m20.508s\n```\n\nIf we just look for deleted (in this case there is none, so I probably should go delete some, but I\u0027ll do that after this comment:\n```\n(venv) [matt@workski ~/.../2/swift (review/matthew_oliver/first_sync_and_big_check *%)]$ time sqlite3 bigdb-700000000/big.db -line \u0027select count(name) from object where deleted \u003d 1\u0027\ncount(name) \u003d 0\n\nreal\t0m0.003s\nuser\t0m0.002s\nsys\t0m0.001s\n```\n\nCan also combine the SQL to get the object_count and add the deleted count to see how expensive that is.. and it isn\u0027t bad at all:\n```\n(venv) [matt@workski ~/.../2/swift (review/matthew_oliver/first_sync_and_big_check *%)]$ time sqlite3 bigdb-700000000/big.db -line \u0027select (select object_count from container_stat) + count(name) from object where deleted \u003d 1;\u0027\n(select object_count from container_stat) + count(name) \u003d 700000000\n\nreal\t0m0.003s\nuser\t0m0.000s\nsys\t0m0.003s\n```","commit_id":"0c61936c497bd790ac65e61cb64cbecc00182b01"},{"author":{"_account_id":1179,"name":"Clay Gerrard","email":"clay.gerrard@gmail.com","username":"clay-gerrard"},"change_message_id":"3ed2b2cefc2585cb22bd422a768f4a1c572c0499","unresolved":false,"context_lines":[],"source_content_type":"","patch_set":2,"id":"d5fa114f_53cdf356","updated":"2024-12-17 16:33:52.000000000","message":"found this draft comment laying around","commit_id":"0c61936c497bd790ac65e61cb64cbecc00182b01"},{"author":{"_account_id":7233,"name":"Matthew Oliver","email":"matt@oliver.net.au","username":"mattoliverau"},"change_message_id":"50b10ffe78b993c65bbb8873146545bd46671f15","unresolved":false,"context_lines":[],"source_content_type":"","patch_set":2,"id":"7d9c8abf_d5c38ac9","updated":"2024-12-09 05:56:59.000000000","message":"gerrit seems to think this is in workflow -1.. at least for me.","commit_id":"0c61936c497bd790ac65e61cb64cbecc00182b01"},{"author":{"_account_id":7233,"name":"Matthew Oliver","email":"matt@oliver.net.au","username":"mattoliverau"},"change_message_id":"ee5ab43dbca77453f073f07adcf20f90cb09994f","unresolved":false,"context_lines":[],"source_content_type":"","patch_set":2,"id":"66862324_d36ae147","in_reply_to":"7d9c8abf_d5c38ac9","updated":"2024-12-09 05:57:44.000000000","message":"Ahh confused by patchset and assume some cache in the brower, fine now","commit_id":"0c61936c497bd790ac65e61cb64cbecc00182b01"}],"swift/common/db.py":[{"author":{"_account_id":1179,"name":"Clay Gerrard","email":"clay.gerrard@gmail.com","username":"clay-gerrard"},"change_message_id":"3ed2b2cefc2585cb22bd422a768f4a1c572c0499","unresolved":true,"context_lines":[{"line_number":771,"context_line":" query \u003d \u0027\u0027\u0027"},{"line_number":772,"context_line":" SELECT COUNT(name)"},{"line_number":773,"context_line":" FROM %s"},{"line_number":774,"context_line":" \u0027\u0027\u0027 % (table, )"},{"line_number":775,"context_line":" with self.get() as conn:"},{"line_number":776,"context_line":" row \u003d conn.execute(query).fetchone()"},{"line_number":777,"context_line":" return row[0]"}],"source_content_type":"text/x-python","patch_set":2,"id":"ce1fa085_ff574258","line":774,"updated":"2024-12-17 16:33:52.000000000","message":"can we approximate this from object_count in the stats table?\n\n select count(name) from \u003chuge_table\u003e\n \nIs slow in sqlite - it has to page through every row from disk; no index - lots of reads from disk, parsing of bytes in C-code - and it\u0027s not in a os-thread so it blocks the reactor (for ~250+ms?)","commit_id":"0c61936c497bd790ac65e61cb64cbecc00182b01"},{"author":{"_account_id":7233,"name":"Matthew Oliver","email":"matt@oliver.net.au","username":"mattoliverau"},"change_message_id":"9f89c81ad8db822ffc9ced31e857e736a40d733b","unresolved":true,"context_lines":[{"line_number":771,"context_line":" query \u003d \u0027\u0027\u0027"},{"line_number":772,"context_line":" SELECT COUNT(name)"},{"line_number":773,"context_line":" FROM %s"},{"line_number":774,"context_line":" \u0027\u0027\u0027 % (table, )"},{"line_number":775,"context_line":" with self.get() as conn:"},{"line_number":776,"context_line":" row \u003d conn.execute(query).fetchone()"},{"line_number":777,"context_line":" return row[0]"}],"source_content_type":"text/x-python","patch_set":2,"id":"e744afab_d7d2edbe","line":774,"in_reply_to":"ce1fa085_ff574258","updated":"2024-12-17 22:24:11.000000000","message":"yeah, good point. I guess I should go test it. This does only hit the index, but I guess we\u0027re still counting every object, even if we\u0027re hitting an index:\n\n```\nsqlite\u003e EXPLAIN QUERY PLAN select count(name) from object;\nQUERY PLAN\n`--SCAN object USING COVERING INDEX ix_object_deleted_name\n```\n\nMaybe I\u0027ll do at Tim suggested and get the object_count from info and then add the tombstones via a similar SQL as above which will also only hit the index. And there should usually be much less of those. And that way we still get a valid size.\n\n```\nsqlite\u003e EXPLAIN QUERY PLAN select count(name) from object where deleted \u003d 1;\nQUERY PLAN\n`--SEARCH object USING COVERING INDEX ix_object_deleted_name (deleted\u003d?)\n```\n\nNOTE this function in only called if we\u0027ve never sync to a handoff before, on less often.","commit_id":"0c61936c497bd790ac65e61cb64cbecc00182b01"}],"swift/common/db_replicator.py":[{"author":{"_account_id":7233,"name":"Matthew Oliver","email":"matt@oliver.net.au","username":"mattoliverau"},"change_message_id":"c5e4881b4eab2aa22b88a8f1d92fe5495617cfb0","unresolved":true,"context_lines":[{"line_number":514,"context_line":" # fall to usync in this case"},{"line_number":515,"context_line":" first_sync_and_big \u003d ("},{"line_number":516,"context_line":" rinfo[\u0027point\u0027] \u003c\u003d 0 and"},{"line_number":517,"context_line":" info[\u0027max_row\u0027] \u003e self.per_diff * self.max_diffs)"},{"line_number":518,"context_line":" # if the difference in rowids between the two differs by"},{"line_number":519,"context_line":" # more than 50% and the difference is greater than per_diff,"},{"line_number":520,"context_line":" # rsync then do a remote merge."}],"source_content_type":"text/x-python","patch_set":1,"id":"1c3b4a8a_955fe36f","line":517,"updated":"2023-11-16 23:00:44.000000000","message":"hmm, thinking about this... this is all good in the shard handoff case but if we\u0027re just talking about a simple small handoff case this would force it back into rsync when it\u0027s small enough to usync.. basically reverting the note on line 521.\n\nBecause the max_row has nothing to do with houw many rows are currently in a db.\n\nBack to the drawing board.","commit_id":"1882301c67b6b89224a29d251c0f343c1e566355"},{"author":{"_account_id":7233,"name":"Matthew Oliver","email":"matt@oliver.net.au","username":"mattoliverau"},"change_message_id":"6fc7aa14547c7bf5997c59ec4ed268d3e1c1e01f","unresolved":true,"context_lines":[{"line_number":514,"context_line":" # fall to usync in this case"},{"line_number":515,"context_line":" first_sync_and_big \u003d ("},{"line_number":516,"context_line":" rinfo[\u0027point\u0027] \u003c\u003d 0 and"},{"line_number":517,"context_line":" info[\u0027max_row\u0027] \u003e self.per_diff * self.max_diffs)"},{"line_number":518,"context_line":" # if the difference in rowids between the two differs by"},{"line_number":519,"context_line":" # more than 50% and the difference is greater than per_diff,"},{"line_number":520,"context_line":" # rsync then do a remote merge."}],"source_content_type":"text/x-python","patch_set":1,"id":"633177a2_0fd00902","line":517,"in_reply_to":"1c3b4a8a_955fe36f","updated":"2023-11-17 03:13:29.000000000","message":"We can get the obj count, so we could use that, but that isn\u0027t really reflective of the true row count, as there could be things marked as deleted.\nWe could do a count(rowid) that should be somewhat indexed.. but still might not be fastest. As far as I can tell sqlite doesn\u0027t keep a counter of rows.","commit_id":"1882301c67b6b89224a29d251c0f343c1e566355"},{"author":{"_account_id":7233,"name":"Matthew Oliver","email":"matt@oliver.net.au","username":"mattoliverau"},"change_message_id":"16a097fc1bd5a78869ffe1b16653420e1539c1d5","unresolved":false,"context_lines":[{"line_number":514,"context_line":" # fall to usync in this case"},{"line_number":515,"context_line":" first_sync_and_big \u003d ("},{"line_number":516,"context_line":" rinfo[\u0027point\u0027] \u003c\u003d 0 and"},{"line_number":517,"context_line":" info[\u0027max_row\u0027] \u003e self.per_diff * self.max_diffs)"},{"line_number":518,"context_line":" # if the difference in rowids between the two differs by"},{"line_number":519,"context_line":" # more than 50% and the difference is greater than per_diff,"},{"line_number":520,"context_line":" # rsync then do a remote merge."}],"source_content_type":"text/x-python","patch_set":1,"id":"ea24cc7e_4d606185","line":517,"in_reply_to":"633177a2_0fd00902","updated":"2024-12-09 05:55:23.000000000","message":"`select count(name) from object;` only hits the index. And the new version of the code only does this if it has never sync before. Once it has it\u0027ll only rsync_then_merge when the row_ids get too out of sync.","commit_id":"1882301c67b6b89224a29d251c0f343c1e566355"}]}