2025, Nov 23 11:00
Fast NumPy method to mark indices with any satisfying partner: use extrema and a single threshold
Speed up pairwise checks in NumPy: replace the cross-product with a threshold inequality. Vectorized comparisons mark valid indices using extrema. Faster.
When you need to scan two large NumPy arrays and mark every index that has at least one partner satisfying a condition, a naive cross-product is a non-starter. Consider the task “for each element in one array, does there exist any element in the other array such that f(x1, x2) > 0?” Doing this with a loop over one array and vectorizing over the other works, but it scales poorly and tempts you to build a 10 000 × 10 000 matrix you don’t actually need.
Problem setup
The following snippet shows a straightforward approach that iterates over one array and keeps the logic vectorized over the other. It collects indices from both arrays that take part in at least one valid pair:
import numpy as np
arr_left = np.random.rand(10000)
arr_right = np.random.rand(10000)
def score(u, v):
return np.exp(v - u) / 1.2 - 1
ok_left = set()
ok_right = set()
for idx_l in range(len(arr_left)):
idx_r_hits = np.nonzero(score(arr_left[idx_l], arr_right) > 0)[0].tolist()
ok_right.update(idx_r_hits)
if len(idx_r_hits) > 0:
ok_left.add(idx_l)
print(sorted(ok_left))
print(sorted(ok_right))
Why this is slow and what really matters
The key is the condition itself. For the function f defined as exp(x2 − x1)/1.2 − 1, the inequality f(x1, x2) > 0 is equivalent to x2 > x1 + log(1.2). That means we don’t need to check every pair. For any value in the first array to be “valid,” it simply needs to be small enough that at least one value in the second array can exceed it by more than log(1.2). Conversely, a value in the second array is “valid” if it is large enough to exceed at least one value in the first array by more than the same offset.
This observation lets us avoid building a full grid of comparisons or running a long Python loop. We only need extrema of the opposite array and a single threshold.
Vectorized solution
Here is the compact approach that computes the index sets directly, without a for-loop and without a cross-product matrix:
import numpy as np
arr_left = np.random.rand(10000)
arr_right = np.random.rand(10000)
delta = np.log(1.2)
idx_left = np.flatnonzero(arr_left < arr_right.max() - delta)
idx_right = np.flatnonzero(arr_right > arr_left.min() + delta)
The transformation f(x1, x2) > 0 ⇔ x2 > x1 + log(1.2) implies two simple thresholds. Any element in the left array qualifies if it is less than max(arr_right) − log(1.2). Any element in the right array qualifies if it is greater than min(arr_left) + log(1.2). The use of np.flatnonzero extracts indices in one pass per array.
How to validate correctness
Do not try to validate by pairing filtered indices one-to-one and rechecking f on those pairs. The original logic is existential: “for an element in one array, does there exist any element in the other array that satisfies the condition?” A one-to-one check discards the existential nature and can mislead you.
The reliable way is to compare the final indices produced by the fast method to those from the original loop-based method. If both produce identical sets of indices, the optimization is correct. You can assert this with NumPy’s equality checks:
# Compute ok_left / ok_right using the loop-based version above
# Compute idx_left / idx_right using the vectorized thresholds above
left_match = np.array_equal(np.sort(np.fromiter(ok_left, dtype=int)), np.sort(idx_left))
right_match = np.array_equal(np.sort(np.fromiter(ok_right, dtype=int)), np.sort(idx_right))
print(left_match, right_match)
Why this matters
This approach avoids constructing a 10 000 × 10 000 matrix and skips a Python-level loop over 10 000 elements. It leverages the algebraic form of the condition to reduce the problem to simple comparisons against precomputed extrema, which is exactly what NumPy is good at. There are broader takeaways as well. For monotonic functions like exp, sorting and a binary search can often reduce work when you need to find minimal satisfying elements, while a tiling strategy can help reduce memory traffic in some matrix-style scans. Also note that there is no general-purpose hardware acceleration that magically speeds up an all-pairs cross-product check; structural insights like the inequality above are what move the needle.
Conclusion
When a pairwise condition simplifies to a threshold inequality, exploit it. Convert the predicate into a relation based on extrema and a single offset, then use vectorized comparisons and np.flatnonzero to fetch indices in one pass. Validate by comparing against the baseline loop’s outputs, not by forcing a one-to-one pairing. Keep an eye out for monotonicity and ordering opportunities, and avoid building unnecessary cross-product data whenever a simple bound tells the full story.