2025, Oct 28 11:00
Byte-for-Byte Alignment of Binary .SET Files in Python: Update a Bytearray in Memory, Then Write and Truncate
Learn to sync binary .SET files byte-for-byte in Python: update in memory, then write once and truncate. Avoid in-loop writes and offset issues for stable sync
When you need to align the contents of two binary .SET files byte-for-byte, the instinct to patch differences in place is understandable. In this case, the goal was to copy differing bytes from a working GPS receiver’s SET file into the one from a receiver stuck in a boot loop. The initial Python approach ran without errors on Windows 11, but the target file didn’t change. The fix turned out to be about when and how the data is written.
Problem setup and failing example
The intent was to compare two files and write differences directly into the target file while iterating:
from itertools import zip_longest
counter = 0
with open(r"C:\Users\jimal\Desktop\APP2\mgnShell.set", "r+b") as bad_fp, open(r"E:\APP\mgnShell.set", "rb") as good_fp:
    bad_blob = bad_fp.read()
    bad_bytes = bytearray(bad_blob)
    good_blob = good_fp.read()
    good_bytes = bytearray(good_blob)
    for b_bad, b_good in zip_longest(bad_bytes, good_bytes):
        counter += 1
        if b_bad != b_good:
            bad_fp.seek(bad_bytes.index(b_bad))
            if b_good == None:
                bad_fp.write(bytes(0x00))
            else:
                bad_fp.write(bytes(b_good))
            
print(f"Done! Count was {counter}")What actually went wrong
The core issue wasn’t about OS permissions or folder attributes. The problem lived in the write strategy. Writing inside the loop fights against the flow of the operation. The whole set of changes needs to be assembled first, and only then should the write happen. In other words, update the in-memory bytearray, and after the loop completes, write that finalized buffer back to disk.
There’s a second part to the fix: directly modifying the file while you iterate the data complicates control over offsets and final size. Updating the bytearray first, then writing once, ensures the file receives a coherent, complete result.
Working approach
The corrected flow reads both files, computes the differences, updates the target buffer in memory, and only after the loop writes the final buffer to the file and truncates it:
counter = 0
with open(r"C:\Users\jimal\Desktop\APP2\mgnShell.set", "rb+") as dst_fp, open(r"E:\APP\mgnShell.set", "rb") as src_fp:
    dst_raw = dst_fp.read()
    dst_arr = bytearray(dst_raw)
    src_raw = src_fp.read()
    src_arr = bytearray(src_raw)
    for b_dst, b_src, idx in zip(dst_arr, src_arr, range(len(dst_arr))):
        counter += 1
        if b_dst != b_src:
            dst_arr[idx] = b_src
    data_out = dst_arr[0:len(src_arr)]
    dst_fp.seek(0)
    dst_fp.write(data_out)
    dst_fp.truncate()
                
print(f"Done! Count was {counter}")Why this matters
Binary file edits are unforgiving. A small misstep in write timing or offset management can lead to no-ops or partial writes that silently miss the intended effect. Consolidating all modifications in memory and committing once gives you a clean boundary: differences are computed first; persistence happens second. It also lines up with a straightforward mental model—read, transform, write—which is easier to validate and reason about.
If the goal is to make two files identical, sometimes the simplest path is to replace one with the other. When you do need to transform data, it can be simpler and safer to open files for reading, prepare the result in memory, and then open for writing to persist the final buffer. If something appears off, basic print debugging—printing types, lengths, and progress through the loop—helps confirm what the code is really doing at each step.
Takeaways
For byte-level synchronization between two files, compute all changes in memory, then write once at the end. Seek to the start, write the prepared buffer, and truncate to ensure the on-disk file matches the source length. If you only need identical content, consider a direct replacement. And when diagnosing, make the invisible visible: instrument the code with prints to verify assumptions about data and control flow.
The article is based on a question from StackOverflow by jacob malu and an answer by jacob malu.