2025, Dec 07 07:00

Vectorized Batch Solution for Millions of 2×2 Linear Systems in Python Using NumPy

Learn how to solve millions of independent 2×2 linear systems fast in Python with vectorized NumPy batching, avoiding slow symbolic loops. Code examples.

Solving millions of tiny linear systems is a classic case where the algorithm is fine but the implementation kills performance. If each system is 2×2 and independent, a sequential symbolic solve per row will not scale. The right approach is to batch the work and let a numeric linear algebra backend do what it’s designed for.

Problem setup

The workload consists of independent 2-variable linear systems defined by coefficient vectors A, B, C, A', B', C'. At each index i the system is:

A[i] * x + B[i] * y = -C[i]
A'[i] * x + B'[i] * y = -C'[i]

Sequential approach that doesn’t scale

A straightforward implementation uses a Python loop and a symbolic solver per system. That is simple, but the overhead becomes prohibitive at 10M+ rows.

from sympy import symbols, Eq, solve
u_sym, v_sym = symbols('u_sym v_sym')
res_x = []
res_y = []
for idx in range(len(A)):
    sol_pair = solve([
        Eq(A[idx] * u_sym + B[idx] * v_sym, -C[idx]),
        Eq(A_prime[idx] * u_sym + B_prime[idx] * v_sym, -C_prime[idx])
    ], [u_sym, v_sym])
    res_x.append(sol_pair[u_sym])
    res_y.append(sol_pair[v_sym])

This pattern does exactly what it says: it solves one system at a time inside the Python interpreter, driving a symbolic engine for each row. With millions of rows, that becomes impractical.

What actually causes the slowdown

The cost comes from two places. First, repeatedly invoking a symbolic solver is orders of magnitude heavier than a numeric routine when the system is tiny and dense. Second, the per-iteration Python loop itself introduces overhead that accumulates at scale. When the systems are independent and identically structured, batching them for a vectorized numeric solve is the natural fit.

Vectorized solution with batched numeric solve

Switching to a batched numeric solver avoids the per-row Python overhead and the symbolic stack altogether. The idea is to assemble all 2×2 left-hand sides into a single array and all right-hand sides into another, and then pass both to a batched solver. A compact illustrative example:

import numpy as np
# Two independent 2x2 systems in a batch
# System 1: [[1, 0], [0, 2]] * [x, y]^T = [2, 4]^T
# System 2: [[2, 0], [0, 1]] * [x, y]^T = [6, 1]^T
batch_lhs = [
    [[1, 0], [0, 2]],
    [[2, 0], [0, 1]],
]
batch_rhs = [
    [[2], [4]],
    [[6], [1]],
]
sol = np.linalg.solve(batch_lhs, batch_rhs)
print(sol)

This yields:

array([[[2.],
        [2.]],
       [[3.],
        [1.]]])

In a more compact form:

import numpy as np
lhs_pack = [[[1, 0], [0, 2]], [[2, 0], [0, 1]]]
rhs_pack = [[[2], [4]], [[6], [1]]]
print(np.linalg.solve(lhs_pack, rhs_pack).tolist())

which prints:

[[[2.0], [2.0]], [[3.0], [1.0]]]

One reported run of the batched np.linalg.solve completed in 0.23 seconds.

From coefficient vectors to batched arrays

Given the original vectors A, B, C, A', B', C', you can directly assemble the left-hand side and right-hand side blocks for all indices. Each 2×2 block is built from the corresponding coefficients, and each right-hand side is the column vector of the constants on the right:

import numpy as np
# Assume A, B, C, A_prime, B_prime, C_prime are same-length 1D arrays/lists
n = len(A)
# Build [ [A[i], B[i]], [A_prime[i], B_prime[i]] ] for each i
lhs_batched = np.array([
    [
        [A[i],       B[i]      ],
        [A_prime[i], B_prime[i]]
    ]
    for i in range(n)
], dtype=float)
# Build [ [-C[i]], [-C_prime[i]] ] for each i
rhs_batched = np.array([
    [
        [-C[i]      ],
        [-C_prime[i]]
    ]
    for i in range(n)
], dtype=float)
# Solve all systems at once
solutions = np.linalg.solve(lhs_batched, rhs_batched)
# Extract x and y as 1D arrays
x_vals = solutions[:, 0, 0]
y_vals = solutions[:, 1, 0]

This keeps the exact algebra of the original formulation and applies it across the entire dataset in a single call.

An even simpler option for 2×2 systems

For a 2×2 system you can simply write the analytical solution down. You don’t need any linear solver. That observation is useful when further reducing dependencies or when you want to maximize control over the computation.

Why it matters

When the dataset grows to millions or tens of millions of rows, the difference between a per-row symbolic solve and a batched numeric solve is the difference between hours and seconds. By structuring the coefficients into batches and offloading the linear algebra to an optimized backend, the entire workload becomes tractable at scale.

Takeaways

If you’re solving vast numbers of small, independent linear systems, avoid per-row symbolic computation. Represent the systems in a batched numeric form and use a vectorized solver. For 2×2 problems, a closed-form expression is also a viable path. The result is simpler code, predictable behavior, and performance that aligns with the size of your data.