2025, Nov 30 23:00

Fast NumPy overlay: map a 1D controller to a target array with run-length mapping, np.repeat, and boolean masks

Use a vectorized NumPy approach to overlay a 1D array via run-length mapping and np.repeat. Avoid object arrays, handle boolean masks, and updates efficiently.

Overlaying one 1D array onto another by mapping groups of target indices looks simple until you need to do it at scale, with boolean masks and partial updates. The common pitfall is to keep the mapping as nested Python lists and then juggle index translation and value expansion in pure Python. With arrays on the order of 100k elements this quickly becomes a bottleneck. The fix is to abandon object arrays and move the whole pipeline into flat, vectorized NumPy operations.

Problem setup

We have a small array whose elements control multiple positions in a larger one. The relationship is encoded as groups of target indices. The challenge is to apply updates selected by a boolean mask, translate those selections to the target index space, and expand the new values to match the number of affected positions.

import numpy as np

# controlling array
cover = np.array([0, 1, 1, 4, 3])

# mapping of cover[i] -> positions in the target array
spans = [[0, 1, 2, 3], [4, 5], [6], [7], [8]]

# target array (current state)
base = np.array([0, 0, 0, 0, 1, 1, 1, 4, 3])

# update: select 1s and 3s, assign new values
pick = (cover == 1) | (cover == 3)
updates = np.array([44, 48, 47])
cover[pick] = updates

# naive index translation and value expansion (inefficient)
spans_obj = np.asarray(spans, dtype=object)
sel_spans = spans_obj[pick]
base_pos = sum(sel_spans, [])
expanded = np.repeat(updates, np.asarray([len(s) for s in sel_spans]))
base[base_pos] = expanded
print(base)  # expected: [0, 0, 0, 0, 44, 44, 48, 4, 47]

This works, but it relies on lists and object dtype, which defeats vectorization and creates overhead proportional to the number of Python objects you shuffle around.

Why it’s slow

NumPy is optimized for contiguous, homogeneous memory. The moment you start aggregating jagged lists or object arrays, operations fall back to Python-level loops. That’s exactly what happens when you sum nested lists of indices and manually repeat values per sublist length. The core of the problem is not the logic itself, but the data layout: you are representing a flat mapping through nested Python structures instead of a simple numeric encoding.

Vectorized approach

The mapping from the controlling array to the target array can be represented with run lengths. Precompute how many target indices each element controls, and use np.repeat to both build the target array and to translate boolean masks and values in one shot. This avoids all object arrays and keeps everything flat and vectorized.

import numpy as np

# controlling array
cover = np.array([0, 1, 1, 4, 3])

# mapping as groups of indices (covers every index of base exactly once)
spans = [[0, 1, 2, 3], [4, 5], [6], [7], [8]]

# precompute run lengths: how many positions each element controls
runs = np.array([len(g) for g in spans])

# build the target array once from cover using the mapping
base = np.repeat(cover, runs)
print(base)  # [0 0 0 0 1 1 1 4 3]

# select using a boolean mask on cover
pick = (cover == 1) | (cover == 3)

# apply updates by translating the mask and expanding values via the same runs
base[np.repeat(pick, runs)] = np.repeat(np.array([44, 48, 47]), runs[pick])
print(base)  # [ 0  0  0  0 44 44 48  4 47]

All expensive work happens once when computing the run lengths. Every subsequent operation stays in the vectorized world: no Python loops, no object arrays, no ad hoc concatenations.

If you prefer to mutate the controlling array and only materialize the target array at the end, that’s also straightforward. This is useful when you have several transforms chained on the controlling array and want a single projection into the target space.

import numpy as np

cover = np.array([0, 1, 1, 4, 3])
spans = [[0, 1, 2, 3], [4, 5], [6], [7], [8]]
runs = np.array([len(g) for g in spans])

pick = (cover == 1) | (cover == 3)
cover[pick] = [44, 48, 47]

# materialize once at the end
base = np.repeat(cover, runs)
print(base)  # [ 0  0  0  0 44 44 48  4 47]

Do you need flat indices at all?

When the mapping spans cover every index in the target exactly once, the flattened index array is always the sequence from zero to the last index. In that case, using integer positions derived from a flattened mapping is redundant. You can address the translated positions directly via np.repeat on the boolean mask, because the repeated mask has the same length and order as the target array.

If you still want an explicit flat index array, it can be built once with np.concatenate(spans), but when the spans cover the entire target, that flattened array equals np.arange(len(base)). The examples above show how to omit it safely.

Why this matters

At 100k elements, object arrays and list manipulations introduce significant overhead and memory churn. Representing the mapping as run lengths compresses the indirection into a numeric form that NumPy can optimize. It reduces the problem to a couple of repeat operations and a single masked assignment, keeping everything cache-friendly and vectorized.

Takeaways

Keep the mapping flat. Precompute how many target positions each source element controls and reuse those lengths. Translate boolean masks with np.repeat, and expand update values with the same lengths slice. Whether you update the target in place or rebuild it from a transformed controlling array, this approach removes Python from the hot path and scales cleanly.

arrays numpy python