2025, Oct 05 07:00
Unpacking Little-endian 12-bit packed data into float32: fast NumPy strides and one-to-one MATLAB port
Learn how to decode little-endian 12-bit packed samples into float32 with vectorized NumPy and a matching MATLAB implementation. Exact bit shifts, scaling, order
Unpacking 12-bit packed samples to float in Python and MATLAB
A common acquisition format packs two 12-bit samples into 3 bytes. That saves space but complicates downstream processing, especially when you need vectorized performance in environments like NumPy and MATLAB. Below is a concise walkthrough that starts from a working Python implementation and shows how to reproduce the exact behavior in MATLAB without changing the math or scaling.
Packed layout and the real-world twist
Each 3-byte block encodes two samples. While it’s easy to sketch bit diagrams, what matters here is how the bytes are actually ordered in memory. The working implementation treats the data as little-endian: the first byte in each 3-byte group contains the least significant bits of the 24-bit chunk. Two 12-bit values are carved out of those 24 bits.
Python reference implementation
Here is a Python function that converts a byte buffer of packed 12-bit values into float32. It uses stride tricks to form a two-column view over 3-byte groups, masks and shifts to extract the two 12-bit integers, then left-shifts to place the sign bit correctly before scaling.
def decode_s12_packed_to_f32(raw):
    import numpy as np
    import numpy.lib.stride_tricks as st
    i32 = np.frombuffer(raw, dtype=np.int32)
    view2 = np.copy(np.transpose(
        st.as_strided(i32,
            shape=(2, int((i32.size*4)/3)),
            strides=(0, 3), writeable=False)))
    mask12 = (1 << 12) - 1
    view2[:, 0] &= mask12
    view2[:, 0] <<= 20
    view2[:, 1] >>= 12
    view2[:, 1] &= mask12
    view2[:, 1] <<= 20
    return view2.reshape(-1).astype(np.float32) * (2.**-31)
What’s actually happening with the bits
The core is simple once the byte order is clear. Every 3 bytes produce two 12-bit integers. The first sample takes the low 8 bits from the first byte and the low 4 bits from the second byte. The second sample takes the high 4 bits from the second byte and all 8 bits from the third byte. After extracting 12-bit integers, shifting left by 20 bits moves the sign bit into the 31st bit of a 32-bit signed integer. Multiplying by 2^-31 produces the desired float scaling.
In practice, this is why the Python implementation uses int32 for bit operations, masks to isolate 12 bits, shifts to split and align the fields, then a final left shift followed by a float32 cast and scaling.
MATLAB translation, one-to-one
The following MATLAB code reproduces the same bit manipulations and scaling. It assumes the input is a flat byte array whose length is a multiple of 3. The array is reshaped to 3-by-N for clarity, then two 12-bit values are assembled from each 3-byte column. Casting to int32 before bitshift matches the behavior of the Python code.
% raw8 is a vector of byte values (0..255) with length divisible by 3
blk = int32(reshape(raw8, 3, []));
out_i32 = zeros(2, size(blk, 2), 'int32');
out_i32(2, :) = bitshift(blk(3, :), 4) + bitshift(blk(2, :), -4);
out_i32(1, :) = bitshift(bitand(blk(2, :), 15), 8) + blk(1, :);
out_i32 = bitshift(out_i32, 20);
out = single(out_i32) * 2^-31;
out = reshape(out, 1, []);
This mirrors the Python logic: assemble the two 12-bit values in little-endian order, shift left by 20 to position the sign, convert to single, and scale by 2^-31. The optional bitand with 15 just makes the intent explicit; it does not change the outcome because the later left shift drops those upper bits anyway.
Why it’s important to get this right
Bit packing formats amplify small mistakes. A swapped byte order or an off-by-four shift silently corrupts every sample. Keeping the operations identical across Python and MATLAB ensures consistency when validating datasets or cross-checking analysis pipelines. In scenarios where measurements are stored together with parameters and metadata, for example as entries inside a ZIP container, it’s typical to first obtain the raw byte payload and then apply deterministic unpacking like above.
In some workflows, when data is in a plain binary file, reading directly with a 12-bit type can be convenient. For example, MATLAB can read 12-bit fields using fread with a ubit12 precision, which may be useful depending on how the bytes are laid out in the file.
Takeaways
The data are little-endian, two 12-bit samples per 3 bytes. The Python function demonstrates a vectorized approach using stride tricks, masking, and shifting, followed by sign placement via a 20-bit left shift and a 2^-31 scaling to float32. The MATLAB code above implements the same sequence of operations, producing the same numerical result. When porting between environments, cast to int32 before shifting, reconstruct the two 12-bit values exactly as defined by the byte order, and preserve the final normalization step.
The article is based on a question from StackOverflow by datenwolf and an answer by Cris Luengo.