2025, Nov 14 15:00

np.histogram last bin spike explained: how bin edges, half-open intervals, and range vs bins skew uint8 histograms

Learn why numpy np.histogram can show a spike in last bin with uint8 data: bin edge semantics, half-open intervals, and range vs bins. Fix it with right edges.

Why does a seemingly uniform random image produce a skewed last bin in np.histogram? If you have ever binned uint8 data and seen an inexplicable spike near the upper end, the cause is not randomness but how np.histogram interprets its arguments and handles bin edges.

Reproducing the behavior

The following snippet generates a 500x500 array of random uint8 values and builds two histograms. The first uses 25-wide edges constructed via Python’s range, the second uses 16-wide edges designed to cover the full 0–255 domain more symmetrically for uint8.

import numpy as np

img = np.random.randint(0, 256, (500, 500)).astype(np.uint8)

freq_25, edges_25 = np.histogram(img, range(0, 255, 25))
print(np.column_stack((freq_25, edges_25[:-1], edges_25[1:])))

freq_16, edges_16 = np.histogram(img, range(0, 257, 16))
print(np.column_stack((freq_16, edges_16[:-1], edges_16[1:])))

In practice, the first printout shows the highest count in the final reported bin [225, 250), whereas the second printout looks uniform across all bins, as intuition would suggest for uniformly random uint8 data.

What actually happens

The key is that range(0, 255, 25) is passed as the bins parameter, not as the range parameter to np.histogram. That means those numbers are taken as explicit bin edges. The second piece is the edge semantics. The function uses half-open intervals for all bins except the last one. As the documentation states:

All but the last (righthand-most) bin is half-open. In other words, if bins is:

[1, 2, 3, 4]

then the first bin is [1,2) (including 1, but excluding 2) and the second [2,3). The last bin, however, is [3,4], which includes 4.

Applied to edges produced by range(0, 255, 25), you get [0, 25, 50, …, 250]. Every bin except the last is half-open, but the last bin is closed on the right. That closed-right behavior is precisely why the final bin collects all values equal to its right edge. In this setup, values of 250 are included in the last bin, which makes it consistently higher than the others. The increase aligns with the idea that this last bin has an extra endpoint included compared to the other bins of width 25.

A straightforward way to see the uniform distribution

When the binning aligns well with the data domain, the outcome is intuitive. Using 16-wide edges up to 257 includes 256 as the last right edge, which does not appear in uint8 data. That means all bins effectively behave as half-open intervals, producing the expected uniform shape.

import numpy as np

img = np.random.randint(0, 256, (500, 500)).astype(np.uint8)

freq_16, edges_16 = np.histogram(img, range(0, 257, 16))
print(np.column_stack((freq_16, edges_16[:-1], edges_16[1:])))

This mirrors the second part of the earlier output where the counts come out balanced.

Why it matters

The behavior is not necessarily obvious at first glance, yet one edge of the domain must be included somewhere. np.histogram chooses to include the right edge in the last bin. If the high endpoint is a value that actually appears in your data, that bin will gather those boundary values and look inflated. When you explicitly supply bin edges, this effect becomes more pronounced if the edges do not neatly cover the data domain.

Takeaways

Always be mindful of which parameter you are providing to np.histogram. Passing a Python range as the second positional argument creates explicit bins, not a target range. Remember that all bins are half-open except the last, which includes the rightmost edge. If you expect a uniform distribution across bins, choose edges so the inclusive right boundary does not coincide with a value in your data domain, as in the example with 16-wide bins up to 257. That small detail eliminates surprises and makes the resulting histogram match your intuition.