2025, Nov 17 11:00

How to reproduce legacy Pillow 5.4.1 grayscale exactly in Python, and match 11.2.x rounding when needed

Learn why Pillow grayscale changed from truncation to rounding and how to replicate outputs: integer ITU-R 601 for 5.4.1 or +0x8000 bias for 11.2.x in Python.

Reproducible grayscale conversion across library versions sounds trivial until a tiny change in rounding flips your pixel values. If you migrated from Pillow 5.4.1 on Python 2.7.18 to Pillow 11.2.1 on Python 3.12.0 and noticed different grayscale outputs, you are not imagining it. The difference stems from how the library rounds intermediate results, and you can fully replicate the legacy behavior in modern Python once you know the exact arithmetic used.

The setup and a minimal failing case

Both environments load the same PNG data, but the grayscale output diverges. Using the floating-point equation L = floor(R * 0.299 + G * 0.587 + B * 0.114) in Python 3.12.0, a pixel (4, 4, 4) can produce 3 due to floating-point rounding, while Pillow 5.4.1 produced 4 for the same input. When R == G == B, the expected luminance should equal that value; any deviation is a rounding artifact.

from math import floor

def fp_luma_approx(r, g, b):
    return floor(r * 0.299 + g * 0.587 + b * 0.114)

print(fp_luma_approx(4, 4, 4))  # observed 3 on Python 3.12.0

What Pillow actually computes

The user-facing documentation mentions the 0.299/0.587/0.114 coefficients, but the implementation in Pillow 5.4.x uses integer arithmetic with a power-of-two scale and a bit shift. This avoids slow divides and, more importantly here, defines a specific rounding behavior. The core is an integer accumulator and a right shift by 16 bits.

In Pillow 5.4.x the internal logic is equivalent to the following C code (names altered, behavior identical):

#define Y24_ACC(px) ((px)[0] * 19595 + (px)[1] * 38470 + (px)[2] * 7471)

static void rgb_to_l_legacy(uint8_t* outbuf, const uint8_t* inbuf, int width)
{
    int i;
    for (i = 0; i < width; i++, inbuf += 4)
        /* ITU-R 601-2 coefficients, nonlinear RGB assumed */
        *outbuf++ = Y24_ACC(inbuf) >> 16;  /* truncation */
}

In Pillow 11.2.x the accumulator changes subtly. An extra + 0x8000 is added before the shift, which converts truncation into rounding-to-nearest:

#define Y24_ACC_ROUND(px) ((px)[0] * 19595 + (px)[1] * 38470 + (px)[2] * 7471 + 0x8000)

That tiny addition is why the outputs differ. With rounding enabled, an RGB like (0, 1, 0) yields a gray value of 1 rather than 0. Across all 16,777,216 RGB triplets, 8,388,586 outputs differ, and the difference is never larger than one count.

Root cause of the discrepancy

There are two compounding effects. First, the legacy code used integer math with truncation via a right shift; the modern code adds a rounding bias before shifting. Second, the literal floating-point equation with 0.299/0.587/0.114 can be affected by binary floating-point representation, which may evaluate to values like 3.999999… for inputs where the mathematically exact sum is 4. The combination explains why the simple float-based approximation in Python 3.12.0 does not match Pillow 5.4.1.

The fix: replicate Pillow 5.4.1 output in Python 3.12

To reproduce the legacy grayscale exactly, do the same integer arithmetic as Pillow 5.4.x: use the 16-bit scaled ITU-R 601 coefficients, accumulate in a wide integer, and truncate by shifting right by 16. The snippet below does this with NumPy on an RGB image and returns a single-channel 8-bit image.

from PIL import Image
import numpy as np

def to_gray_legacy(pil_image):
    rgb = np.array(pil_image.convert('RGB'))
    r = rgb[:, :, 0].astype(np.uint32)
    g = rgb[:, :, 1].astype(np.uint32)
    b = rgb[:, :, 2].astype(np.uint32)
    acc = r * 19595 + g * 38470 + b * 7471
    out8 = (acc >> 16).astype(np.uint8)  # truncation, matches Pillow 5.4.x
    return Image.fromarray(out8)

If you instead need the current Pillow behavior, add the rounding bias before shifting:

def to_gray_current(pil_image):
    rgb = np.array(pil_image.convert('RGB'))
    r = rgb[:, :, 0].astype(np.uint32)
    g = rgb[:, :, 1].astype(np.uint32)
    b = rgb[:, :, 2].astype(np.uint32)
    acc = r * 19595 + g * 38470 + b * 7471 + 0x8000
    out8 = (acc >> 16).astype(np.uint8)  # rounding-to-nearest, matches 11.2.x
    return Image.fromarray(out8)

Both functions preserve the exact program logic of the respective library versions. They also avoid floating-point altogether, which eliminates the 3.999999… style surprises.

Why this detail matters

The difference is only ever a single count, but that can cascade in downstream processing. For edge maps, binarization, or precise line comparisons, even a one-level shift in luminance may change what crosses a threshold and alter detected line segments. The scope of change is not trivial either: nearly half of all possible RGB triplets flip to a neighboring value when switching from truncation to rounding.

If absolute reproducibility with historical results is required, mirroring the legacy integer path is the most direct solution in modern Python. As an operational alternative, it is also possible to invoke an older interpreter running Pillow 5.4.1 via a subprocess and pass pixels through IPC, but reproducing the math natively is usually cleaner and faster to maintain.

Takeaways

The perceived mismatch wasn’t a random bug; it was a deliberate change from truncation to rounding. Floating-point approximations of 0.299/0.587/0.114 are not bit-for-bit compatible with the integer path. If you need exact parity with Pillow 5.4.1, rely on the integer coefficients 19595, 38470, 7471, accumulate in a wide integer, and right-shift by 16 with no added bias. If you want modern Pillow parity, add 0x8000 before the shift. Knowing which rounding rule your pipeline depends on will save you time, keep your results stable, and prevent those off-by-one surprises.