2025, Oct 30 11:00

Fast 2D Pearson Correlation per Overlap Using correlate2d: Correct, Zero-Padding-Free Method

Learn how to compute a fast 2D Pearson correlation surface per offset using overlap-aware sums with correlate2d - no zero padding, correct per-lag normalization.

Computing a 2D cross-correlation that behaves like Pearson correlation at every offset is a common need in image analysis and gridded sensor data. The tricky part is to avoid zero padding and instead use only the true overlap between the arrays for each lag. A naive implementation with nested loops and per-lag pearsonr works, but it is painfully slow. Below is a practical way to do it fast, while keeping the math faithful to Pearson R over the overlapping region at each shift.

Problem

You want a 2D correlation surface with shape equal to two times the input size minus one in each dimension, just like scipy.signal.correlate2d in full mode. But instead of padding with zeros near the edges, each output cell should be the Pearson correlation coefficient computed from the data that actually overlaps at that offset. For an input of size rows by cols, the overlap at row_lag and col_lag is (rows − |row_lag|) times (cols − |col_lag|). A straightforward solution would slice per lag and call pearsonr on flattened overlaps, yet that does not scale.

Code that looks right but is not Pearson per lag

A tempting approach is to z-score both arrays globally, run correlate2d once, and divide by the overlap size. This is fast, but it normalizes globally rather than per overlap window, so the result is not the Pearson correlation coefficient for nonzero lags.

import numpy as np
from scipy.signal import correlate2d

def norm_std(arr):
    return (arr - np.mean(arr)) / np.std(arr)

def xcorr2d_overlap_scaled(u, v):
    assert u.shape == v.shape
    assert u.ndim == 2

    h, w = u.shape
    dr = np.arange(-h + 1, h)
    dc = np.arange(-w + 1, w)

    ov_h = h - np.abs(dr)
    ov_w = w - np.abs(dc)
    overlap = ov_h.reshape((-1, 1)) * ov_w.reshape((1, -1))

    u0 = norm_std(u)
    v0 = norm_std(v)
    raw = correlate2d(u0, v0)
    return raw / overlap

This produces a surface that coincides with Pearson R only in the center, where the overlap covers the entire arrays. Away from the center, Pearson requires local mean and local standard deviation computed from the overlapping window at each lag. Global z-scoring cannot capture that.

Why the naive normalization fails

Pearson correlation at a given lag is defined as the covariance of the overlapping values divided by the product of their standard deviations, both computed from the same overlap region. The numerator involves the sum of products minus the product of sums scaled by the number of overlapping elements. The denominator depends on local variance, which in turn depends on local sums and local sums of squares. Dividing a globally normalized correlation by the overlap size does not recover these per-window statistics and therefore does not match Pearson R except at full overlap.

Fast solution with convolution-style sums

The key is to compute, for every offset, the number of overlapping elements, the sum of each array over that overlap, their sum of squares, and the sum of their products. These can all be obtained with correlate2d by convolving the inputs with an array of ones of the same shape as the inputs. With those, you can form the Pearson numerator and denominator per lag, in one vectorized pass.

import numpy as np
from scipy.signal import correlate2d

def pearson_xcorr2d_fast(x, y):
    assert x.shape == y.shape
    x = x.astype(np.float64)
    y = y.astype(np.float64)

    h, w = x.shape
    box = np.ones_like(x)

    overlap_count = correlate2d(box, box)
    sum_x = correlate2d(x, box)
    sum_y = correlate2d(box, y)
    sum_xy = correlate2d(x, y)
    sum_x2 = correlate2d(x * x, box)
    sum_y2 = correlate2d(box, y * y)

    num = sum_xy - (sum_x * sum_y) / overlap_count
    var_x = sum_x2 - (sum_x ** 2) / overlap_count
    var_y = sum_y2 - (sum_y ** 2) / overlap_count
    den = np.sqrt(var_x * var_y)

    with np.errstate(invalid='ignore', divide='ignore'):
        rmap = num / den
    rmap[np.isnan(rmap)] = 0
    return rmap

The output has shape (2*h − 1, 2*w − 1), the same as correlate2d in full mode. Each cell is the Pearson correlation coefficient of the two inputs over exactly the region that overlaps at that lag. Where the local variance is zero in either overlap, the value is undefined; those entries map to zero in the code above.

Checking the intermediate sums

The construction of overlap-aware sums can be verified at any chosen lag. The sum over x at a particular offset equals the sum of the appropriately sliced overlap region.

# Example check for a positive offset
h, w = x.shape
lag_r, lag_c = 3, 7
r_idx = h - 1 + lag_r
c_idx = w - 1 + lag_c
x_slice = x[lag_r:, lag_c:]
np.allclose(sum_x[r_idx, c_idx], np.sum(x_slice))

This aligns with the mental model that correlate2d with a ones kernel counts and aggregates exactly the overlapping bins for each offset.

Why this matters

For real-world 2D data, looping over every lag and recomputing Pearson statistics over flattened windows is prohibitively slow. The convolution-based approach leverages correlate2d to compute all necessary sums for every offset in one shot, yielding a surface that is consistent with the slow reference. It also avoids the pitfall of global normalization that distorts values near the edges. Note that for sparse inputs where most overlap regions have virtually no variance, Pearson R will be undefined or zero; in contrast, raw cross-correlation from correlate2d can still show alignment peaks because it does not normalize by variance.

Takeaways

Use overlap-aware Pearson correlation when similarity must be measured in the statistical sense at every spatial offset. Global z-scoring plus correlation cannot replace per-lag normalization. The convolution trick with a ones array lets you compute counts, sums, sums of squares, and cross-sums across all lags efficiently, and from these you obtain the correct Pearson R surface. When validating, compare pointwise with a simple for loop on a small input. If your data are mostly constant over the overlap, expect zero or undefined correlations at those positions, which is appropriate for Pearson.

The article is based on a question from StackOverflow by pas-calc and an answer by pas-calc.