)]}'
{"/PATCHSET_LEVEL":[{"author":{"_account_id":1179,"name":"Clay Gerrard","email":"clay.gerrard@gmail.com","username":"clay-gerrard"},"change_message_id":"d5fd7e18451e8b1d3e20bd17c89c46aa5cdc7a80","unresolved":false,"context_lines":[],"source_content_type":"","patch_set":2,"id":"bb7ef9cc_fdb80d7e","updated":"2025-08-01 01:25:14.000000000","message":"I would like to see some of these ideas make their way over to ringv2 after it lands - i\u0027m not sure how many of them are significant or obvious improvements, but *I* still understand why I decided these change were not just \"different\" but actually *needed* to \"fix\" what\u0027s \"wrong\" with the current implementation over in\n\nRelated-Change-Id: Ia0ac4ea2006d8965d7fdb6659d355c77386adb70\n\n... but I knew when I started this that non-functional requirements are tricky!\n\n\u003e Unfortunately \"readability\" is often subjective\n\nhttps://github.com/NVIDIA/swift/blob/master/REVIEW_GUIDELINES.rst#maintainable-code-is-obvious\n\nIn an effort to try and remove my bias and keep up with new hawtness I tried to outsource an evaluation of \"maintainability\" to a robot.  Here\u0027s some info and an excerpt from \"overall assessment\"\n\nhttps://gist.github.com/clayg/7c3c3cc5f4819e885a1ea965fe8d0005\n\n\u003e The abstractions are functionally complete but architecturally complex. The main issues are:\n\nhttps://gist.github.com/clayg/203ba1b6e441543d0ebea84c4349c602\n\n\u003e The abstractions are doing a reasonable job of separating concerns, but there are several areas where they could be simplified:\n\n... I guess beauty is in the eye of the beholder, and maybe code that humans love to read isn\u0027t even a axis that will provide value in the future?  I\u0027m perfectly happy to let this sit for now - and I\u0027ll pull apart what\u0027s useful after we land ringv2.","commit_id":"51c5cf5669371e20960d97f6eb9dad73dbed4ccf"}],"swift/common/ring/io.py":[{"author":{"_account_id":15343,"name":"Tim Burke","email":"tburke@nvidia.com","username":"tburke"},"change_message_id":"63588974216a95d827e30294550ad38268022a43","unresolved":true,"context_lines":[{"line_number":211,"context_line":"    ring_magic_version \u003d None"},{"line_number":212,"context_line":""},{"line_number":213,"context_line":"    def __init__(self, fp):"},{"line_number":214,"context_line":"        self.fp \u003d fp"},{"line_number":215,"context_line":""},{"line_number":216,"context_line":"    @abc.abstractmethod"},{"line_number":217,"context_line":"    def serialize(self, ring_dict):"}],"source_content_type":"text/x-python","patch_set":1,"id":"af0448c2_bc567b3d","line":214,"updated":"2025-06-18 18:17:20.000000000","message":"This feels weird to me -- I don\u0027t feel like I\u0027ve ever seen another codec or marshaler/unmarshaller that works like this. Probably because you would never expect more than one call to `serialize`/`deserialize` with the same `fp`... so why carry that state?\n\nI would\u0027ve expected `fp` to be an arg to `serialize`/`deserialize`, like `json.dump`/`json.load` or `pickle.dump`/`pickle.load`","commit_id":"c31514a5a821d4871f338f4e9ec0d7b9d4ee30c8"},{"author":{"_account_id":1179,"name":"Clay Gerrard","email":"clay.gerrard@gmail.com","username":"clay-gerrard"},"change_message_id":"d5fd7e18451e8b1d3e20bd17c89c46aa5cdc7a80","unresolved":false,"context_lines":[{"line_number":211,"context_line":"    ring_magic_version \u003d None"},{"line_number":212,"context_line":""},{"line_number":213,"context_line":"    def __init__(self, fp):"},{"line_number":214,"context_line":"        self.fp \u003d fp"},{"line_number":215,"context_line":""},{"line_number":216,"context_line":"    @abc.abstractmethod"},{"line_number":217,"context_line":"    def serialize(self, ring_dict):"}],"source_content_type":"text/x-python","patch_set":1,"id":"fd52a49a_6bdfa20c","line":214,"in_reply_to":"af0448c2_bc567b3d","updated":"2025-08-01 01:25:14.000000000","message":"\u003e you would never expect more than one call to serialize/deserialize with the same fp.\n\nI get that.  But the most obvious alternative seems like `Decoder.parse(fp.read())`\n\nhttps://github.com/python/cpython/blob/main/Lib/json/__init__.py#L293\n\n\u003e so why carry that state?\n\nTwo reasons\n\n1) Since I don\u0027t want to require loading the whole stream into memory to parse it AND I don\u0027t want to adopt the \"parser.feed(chunk)\" pattern - it seems easiest to let the code have the file object and call seek to find the index and then read the sections.\n\n2) It enables `IndexedSectionCodec` to do this trick where you can *either*\n\n```\nwith writer.open_section():\n    writer.write_stuff()\n```\n\nOR\n\n```\nwith writer.open_section() as section:\n    section.write_stuff()\n```\n\nand either way writer or section are are both the same object - but depending on the context `self.fp` is a checksumming wrapper for *just that section*\n\nIt\u0027s sneaky to be sure, but I couldn\u0027t convince myself that I don\u0027t want the impmentation of `write_stuff` to be *on the codec class* as opposed to hanging off some other SectionWriter that you have to grok.","commit_id":"c31514a5a821d4871f338f4e9ec0d7b9d4ee30c8"},{"author":{"_account_id":15343,"name":"Tim Burke","email":"tburke@nvidia.com","username":"tburke"},"change_message_id":"63588974216a95d827e30294550ad38268022a43","unresolved":true,"context_lines":[{"line_number":218,"context_line":"        ..."},{"line_number":219,"context_line":""},{"line_number":220,"context_line":"    @abc.abstractmethod"},{"line_number":221,"context_line":"    def deserialize(self, **kwargs):"},{"line_number":222,"context_line":"        ..."},{"line_number":223,"context_line":""},{"line_number":224,"context_line":""}],"source_content_type":"text/x-python","patch_set":1,"id":"bee4859a_9f762d5d","line":221,"range":{"start_line":221,"start_character":26,"end_line":221,"end_character":34},"updated":"2025-06-18 18:17:20.000000000","message":"It\u0027s off-putting to me that this ABC can\u0027t define its API well enough to say what all kwargs to expect.\n\nWe *know* what the kwargs are -- but then we bury the fact that *some* implementations ignore them by making that a side-effect of their function signatures?","commit_id":"c31514a5a821d4871f338f4e9ec0d7b9d4ee30c8"},{"author":{"_account_id":1179,"name":"Clay Gerrard","email":"clay.gerrard@gmail.com","username":"clay-gerrard"},"change_message_id":"d5fd7e18451e8b1d3e20bd17c89c46aa5cdc7a80","unresolved":true,"context_lines":[{"line_number":218,"context_line":"        ..."},{"line_number":219,"context_line":""},{"line_number":220,"context_line":"    @abc.abstractmethod"},{"line_number":221,"context_line":"    def deserialize(self, **kwargs):"},{"line_number":222,"context_line":"        ..."},{"line_number":223,"context_line":""},{"line_number":224,"context_line":""}],"source_content_type":"text/x-python","patch_set":1,"id":"43e0a382_3c8467c6","line":221,"range":{"start_line":221,"start_character":26,"end_line":221,"end_character":34},"in_reply_to":"bee4859a_9f762d5d","updated":"2025-08-01 01:25:14.000000000","message":"that\u0027s fair - it is interesting that ringv2\u0027s code supports include_devices while ringv1 does not!?  I\u0027m not sure what do with that...\n\nI think it\u0027d be fine if the feedback was \"this isn\u0027t a *real* `abc.ABC`\" - I was just trying to use what it as hook to demonstrate that RingData needs a class that has these methods.","commit_id":"c31514a5a821d4871f338f4e9ec0d7b9d4ee30c8"},{"author":{"_account_id":15343,"name":"Tim Burke","email":"tburke@nvidia.com","username":"tburke"},"change_message_id":"63588974216a95d827e30294550ad38268022a43","unresolved":true,"context_lines":[{"line_number":599,"context_line":"        else:"},{"line_number":600,"context_line":"            size \u003d entry.uncompressed_end - entry.uncompressed_start"},{"line_number":601,"context_line":"            reader \u003d LengthWrapper(orig_fp, size)"},{"line_number":602,"context_line":"        if entry.checksum_method !\u003d \u0027sha256\u0027 and \\"},{"line_number":603,"context_line":"                not self.unsafe_allow_invalid_index_for_testing:"},{"line_number":604,"context_line":"            raise ValueError(\u0027Maybe better to just log a warning?\u0027)"},{"line_number":605,"context_line":"        reader \u003d SectionReader(reader)"}],"source_content_type":"text/x-python","patch_set":1,"id":"4bfe80f1_6a7fca8d","line":602,"range":{"start_line":602,"start_character":8,"end_line":602,"end_character":44},"updated":"2025-06-18 18:17:20.000000000","message":"So you can use `open_section` to open all the sections *except* `swift/index`? Seems like an arbitrary restriction...\n\nOr I guess you\u0027re arguing that the index is not a section, should not be listed in itself, and so it should be unsurprising that `open_section` would prevent it. Makes me wonder why we wait until *now* to enforce it though -- if we want to be very rigid about what constitutes a valid index, why aren\u0027t we checking that all fields are present in `_read_index` or the `IndexEntry` constructor?","commit_id":"c31514a5a821d4871f338f4e9ec0d7b9d4ee30c8"},{"author":{"_account_id":1179,"name":"Clay Gerrard","email":"clay.gerrard@gmail.com","username":"clay-gerrard"},"change_message_id":"d5fd7e18451e8b1d3e20bd17c89c46aa5cdc7a80","unresolved":true,"context_lines":[{"line_number":599,"context_line":"        else:"},{"line_number":600,"context_line":"            size \u003d entry.uncompressed_end - entry.uncompressed_start"},{"line_number":601,"context_line":"            reader \u003d LengthWrapper(orig_fp, size)"},{"line_number":602,"context_line":"        if entry.checksum_method !\u003d \u0027sha256\u0027 and \\"},{"line_number":603,"context_line":"                not self.unsafe_allow_invalid_index_for_testing:"},{"line_number":604,"context_line":"            raise ValueError(\u0027Maybe better to just log a warning?\u0027)"},{"line_number":605,"context_line":"        reader \u003d SectionReader(reader)"}],"source_content_type":"text/x-python","patch_set":1,"id":"f286d61b_b94299a1","line":602,"range":{"start_line":602,"start_character":8,"end_line":602,"end_character":44},"in_reply_to":"4bfe80f1_6a7fca8d","updated":"2025-08-01 01:25:14.000000000","message":"\u003e Seems like an arbitrary restriction...\nI don\u0027t consider the index a section\n\n\u003e Or I guess you\u0027re arguing that the index is not a section\n\nright\n\n\u003e why aren\u0027t we checking that all fields are present in _read_index or the IndexEntry constructor?\n\nI think some of this kludge was to `unsafe_allow_invalid_index_for_testing` - I think it would be an improvement to not allow that and if you agree we should fix ringv2 to require the checksum of every entry in the index be valid.\n\nHell, we SHOULD calculate a checksum FOR the index and write it as a fixed-width uncompressed zlib buffer just before the un-compressed-offset!","commit_id":"c31514a5a821d4871f338f4e9ec0d7b9d4ee30c8"},{"author":{"_account_id":15343,"name":"Tim Burke","email":"tburke@nvidia.com","username":"tburke"},"change_message_id":"63588974216a95d827e30294550ad38268022a43","unresolved":true,"context_lines":[{"line_number":611,"context_line":"            self.current_section \u003d None"},{"line_number":612,"context_line":"            self.fp \u003d orig_fp"},{"line_number":613,"context_line":"        found_checksum \u003d reader.checksum.hexdigest()"},{"line_number":614,"context_line":"        if found_checksum !\u003d entry.checksum_value and \\"},{"line_number":615,"context_line":"                not self.unsafe_allow_invalid_index_for_testing:"},{"line_number":616,"context_line":"            raise ValueError("},{"line_number":617,"context_line":"                \u0027Hash mismatch in section %s: %r found; %r expected\u0027 % ("}],"source_content_type":"text/x-python","patch_set":1,"id":"5503b61e_7ff65d53","line":614,"updated":"2025-06-18 18:17:20.000000000","message":"`SectionReader`s lost the ability to catch incomplete reads -- how do we know that the caller actually read all of the section?","commit_id":"c31514a5a821d4871f338f4e9ec0d7b9d4ee30c8"},{"author":{"_account_id":1179,"name":"Clay Gerrard","email":"clay.gerrard@gmail.com","username":"clay-gerrard"},"change_message_id":"d5fd7e18451e8b1d3e20bd17c89c46aa5cdc7a80","unresolved":true,"context_lines":[{"line_number":611,"context_line":"            self.current_section \u003d None"},{"line_number":612,"context_line":"            self.fp \u003d orig_fp"},{"line_number":613,"context_line":"        found_checksum \u003d reader.checksum.hexdigest()"},{"line_number":614,"context_line":"        if found_checksum !\u003d entry.checksum_value and \\"},{"line_number":615,"context_line":"                not self.unsafe_allow_invalid_index_for_testing:"},{"line_number":616,"context_line":"            raise ValueError("},{"line_number":617,"context_line":"                \u0027Hash mismatch in section %s: %r found; %r expected\u0027 % ("}],"source_content_type":"text/x-python","patch_set":1,"id":"5490d858_67b3c83d","line":614,"in_reply_to":"5503b61e_7ff65d53","updated":"2025-08-01 01:25:14.000000000","message":"IIRC reading a section will prevent reading PAST a section boundary by enforcement via LengthWrapper.  If you don\u0027t read all the bytes the you should get this ValueError!\n\nWhich could possibly be made more clear with a different error that\u0027s based off the LengthWrapper.\n\nOr perhaps ... maybe you\u0027re saying\n1) we should allow incomplete reads\n2) when we have an incomplete read with should either a) not enforce the checksum or b) read the rest anyway and validate the checksum\n\nIf so, I\u0027d like to push back and say AFAIK we only ever need to read whole sections.  Is that correct?","commit_id":"c31514a5a821d4871f338f4e9ec0d7b9d4ee30c8"},{"author":{"_account_id":15343,"name":"Tim Burke","email":"tburke@nvidia.com","username":"tburke"},"change_message_id":"63588974216a95d827e30294550ad38268022a43","unresolved":true,"context_lines":[{"line_number":742,"context_line":"        \"\"\""},{"line_number":743,"context_line":"        if self.ring_magic_version !\u003d self.fp.version:"},{"line_number":744,"context_line":"            raise ValueError(\u0027%s can not deserialize ring version %r\u0027 % ("},{"line_number":745,"context_line":"                self.__class__.__name__, self.fp.version))"},{"line_number":746,"context_line":"        self.read_index()"},{"line_number":747,"context_line":"        self.fp.seek(0)"},{"line_number":748,"context_line":""}],"source_content_type":"text/x-python","patch_set":1,"id":"98295e1b_d89df57e","line":745,"updated":"2025-06-18 18:17:20.000000000","message":"We keep saying this. Should we split to a `deserialize`/`_deserialize`, where the ABC handles the version check then calls the per-subclass `_deserialize` that does the bulk of the work?","commit_id":"c31514a5a821d4871f338f4e9ec0d7b9d4ee30c8"},{"author":{"_account_id":1179,"name":"Clay Gerrard","email":"clay.gerrard@gmail.com","username":"clay-gerrard"},"change_message_id":"d5fd7e18451e8b1d3e20bd17c89c46aa5cdc7a80","unresolved":true,"context_lines":[{"line_number":742,"context_line":"        \"\"\""},{"line_number":743,"context_line":"        if self.ring_magic_version !\u003d self.fp.version:"},{"line_number":744,"context_line":"            raise ValueError(\u0027%s can not deserialize ring version %r\u0027 % ("},{"line_number":745,"context_line":"                self.__class__.__name__, self.fp.version))"},{"line_number":746,"context_line":"        self.read_index()"},{"line_number":747,"context_line":"        self.fp.seek(0)"},{"line_number":748,"context_line":""}],"source_content_type":"text/x-python","patch_set":1,"id":"e07247a6_233d7882","line":745,"in_reply_to":"98295e1b_d89df57e","updated":"2025-08-01 01:25:14.000000000","message":"yeah I had that written at one point 😊\n\nBut I couldn\u0027t come up with a good reason to have a `_serialize` and liked the symmetry in the base class.\n\nHonestly, at some-point I decided this is purposefully DAMP because I want each Codec to have some say about what it will and won\u0027t do with a given RingGzReader.\n\nBut that might be cope, if we decided eventually that deserialize doesn\u0027t have to enforce this or wants to raise a different error we could always we could always override it in a subclass or refactor - I\u0027m not married to it; but it seemed obvious enough and not really that important to maintain consistency of the shape of the ValueError across Codecs.","commit_id":"c31514a5a821d4871f338f4e9ec0d7b9d4ee30c8"},{"author":{"_account_id":15343,"name":"Tim Burke","email":"tburke@nvidia.com","username":"tburke"},"change_message_id":"63588974216a95d827e30294550ad38268022a43","unresolved":true,"context_lines":[{"line_number":797,"context_line":""},{"line_number":798,"context_line":"    Can be created directly from in memory objects via __init__ or dict."},{"line_number":799,"context_line":""},{"line_number":800,"context_line":"    Abstract interface over serialization."},{"line_number":801,"context_line":"    \"\"\""},{"line_number":802,"context_line":""},{"line_number":803,"context_line":"    def __init__(self, replica2part2dev_id, devs, part_shift,"}],"source_content_type":"text/x-python","patch_set":1,"id":"3d225dd3_f20fb186","line":800,"updated":"2025-06-18 18:17:20.000000000","message":"So we\u0027re leaning into the fact that `RingData` is only really used as part of (de)serialization by moving it to `io` -- remind me again: why\u0027d you want to get it out of the business of deciding how wide dev ids will be? Seems to me that regardless of what width arrays the `RingBuilder` is using, the \"Abstract interface over serialization\" ought to be the one ultimately responsible for deciding how wide they ought to be on disk...","commit_id":"c31514a5a821d4871f338f4e9ec0d7b9d4ee30c8"},{"author":{"_account_id":1179,"name":"Clay Gerrard","email":"clay.gerrard@gmail.com","username":"clay-gerrard"},"change_message_id":"d5fd7e18451e8b1d3e20bd17c89c46aa5cdc7a80","unresolved":true,"context_lines":[{"line_number":797,"context_line":""},{"line_number":798,"context_line":"    Can be created directly from in memory objects via __init__ or dict."},{"line_number":799,"context_line":""},{"line_number":800,"context_line":"    Abstract interface over serialization."},{"line_number":801,"context_line":"    \"\"\""},{"line_number":802,"context_line":""},{"line_number":803,"context_line":"    def __init__(self, replica2part2dev_id, devs, part_shift,"}],"source_content_type":"text/x-python","patch_set":1,"id":"91b62c59_2f98bc26","line":800,"in_reply_to":"3d225dd3_f20fb186","updated":"2025-08-01 01:25:14.000000000","message":"\u003e why\u0027d you want to get it out of the business of deciding how wide dev ids will be?\n\nbecause ring_io doesn\u0027t modify devs or r2p2d - only the builder does that.\n\n\u003e the \"Abstract interface over serialization\" ought to be the one ultimately responsible for deciding how wide they ought to be on disk...\n\nbut deciding how wide the arrays should be is not just \"on disk\" that matters - as you\u0027ve pointed out: the *builder* cares a LOT about the size and shape of devs and r2p2d because of the significance of none_dev_id in the r2p2d table.\n\nOnce you get to RingData the list is just json and the arrays are just bytes - we put them on disk as they are and if they\u0027re inconsistent you\u0027re already screwed.\n\n... at least that was thinking.  And in some version of the test suite it seemed to mostly work!  But then a test snuck in that did this (which I thought was amazing)\n\n```\nrd \u003d RingData(wide_arrays, small_devs)\nrd.save(format\u003d2) # works\nrd.save(format\u003d1) # blows-up\n```\n\nAnd I wasn\u0027t sure at first WHY - the devs list was 2 and the max_id was 1 - but those wide_arrays!!!  Nonthing in RingData is going to resize them - so Codec[1].serialize() *not* validating the arrays were narrow would have caused Codec[1].deserialize() to get back garbage (I think?)\n\nSo I added some additional validation to Codec[1].serialize() - which may or may not have been the right thing to do.","commit_id":"c31514a5a821d4871f338f4e9ec0d7b9d4ee30c8"},{"author":{"_account_id":1179,"name":"Clay Gerrard","email":"clay.gerrard@gmail.com","username":"clay-gerrard"},"change_message_id":"d5fd7e18451e8b1d3e20bd17c89c46aa5cdc7a80","unresolved":true,"context_lines":[{"line_number":465,"context_line":"    def read(self, amt\u003dNone):"},{"line_number":466,"context_line":"        data \u003d self.fp.read(amt)"},{"line_number":467,"context_line":"        self.checksum.update(data)"},{"line_number":468,"context_line":"        return data"},{"line_number":469,"context_line":""},{"line_number":470,"context_line":""},{"line_number":471,"context_line":"class LengthWrapper:"}],"source_content_type":"text/x-python","patch_set":2,"id":"7f363a3c_61795110","line":468,"updated":"2025-08-01 01:25:14.000000000","message":"it does sort of look like SectionReader and SectionWriter could just be a ChecksumWrapper that supports both read/write (even tho any individual caller might only use one or the other - it might be easier to grok as a single primitive)","commit_id":"51c5cf5669371e20960d97f6eb9dad73dbed4ccf"}]}
