2025, Nov 21 23:00

Extract paired rows from two differently shaped NumPy arrays with a boolean mask, no full broadcasting, use np.where indexing

Learn a memory-safe NumPy pattern to select paired Mx3 rows from two arrays using a boolean mask and np.where, avoiding costly Cartesian-product broadcasting.

Extracting paired rows with a boolean mask across two differently shaped arrays can turn into a memory sink if you materialize the full broadcasted grid. Below is a compact way to get the same result without allocating the giant intermediate tensor.

Problem

You have two arrays whose last axis is size 3, and a boolean mask defined over the Cartesian product of their spatial axes. You want two outputs of shape Mx3 (same M), containing those rows from each source where the mask is true.

import numpy as np

np.random.seed(0)

srcA = np.random.rand(4, 5, 3)
srcB = np.random.rand(6, 3)

mask = np.random.rand(4, 5, 6) >= 0.7

A straightforward approach repeats both inputs to the full combined shape and indexes with the mask. It works, but it forces the creation of a large intermediate of size M1x...xMPxN1x...xNQ, which is exactly what we want to avoid.

shapeA, shapeB = srcA.shape[:-1], srcB.shape[:-1]
tgt_shape = shapeA + shapeB + (3,)

expA = srcA[..., None, :].repeat(np.prod(shapeB), axis=-2).reshape(tgt_shape)
expB = srcB[None, ..., :].repeat(np.prod(shapeA), axis=0).reshape(tgt_shape)

outA, outB = expA[mask, :], expB[mask, :]

Why it breaks down

Repeating to the Cartesian product allocates memory proportional to the product of the spatial dimensions of both inputs. Even if the final selection is sparse, the intermediate becomes the bottleneck. The mask already encodes the matching positions; the key is to reuse those positional indices directly on the original inputs.

Solution: index once, gather twice

Use np.where(mask) to capture the coordinates of all true positions across the combined grid, then slice that coordinate tuple into the parts that address srcA and srcB respectively. Since both inputs are at least 2D, there is no need for special casing.

idx_tup = np.where(mask)

p_dims = len(srcA.shape) - 1  # spatial dims of srcA (excluding last axis=3)
q_dims = len(srcB.shape) - 1  # spatial dims of srcB (excluding last axis=3)

selA = idx_tup[:p_dims]
selB = idx_tup[p_dims:p_dims + q_dims]

resA = srcA[selA]
resB = srcB[selB]

The tuple idx_tup contains one index array per spatial axis of the combined grid. The first p_dims arrays point into the spatial axes of srcA, and the next q_dims arrays point into srcB. Fancy indexing with these slices yields two arrays of shape Kx3, where K is the number of true entries in the mask.

Why this matters

This approach removes the need to allocate the full broadcasted tensor. The mask is used once to derive coordinates, and those coordinates directly pull rows from the original arrays. Memory usage stays close to the size of the inputs plus the index arrays, which is usually orders of magnitude smaller than the Cartesian product.

Takeaways

When a boolean mask spans the product of two input grids, avoid materializing the broadcasted arrays. Compute the index coordinates with np.where once, split them according to how many spatial axes each input has, and gather from the originals. You keep the output shape Mx3 for both results and bypass the heavy intermediate entirely.