2025, Sep 17 17:00
Matrix–Vector Stacks in NumPy: Using np.matvec Axes, moveaxis/transpose, and an einsum Equivalent
Learn to apply a 3×2 matrix to a 2×4×5 vector stack in NumPy with np.matvec. Use axis mapping, moveaxis/transpose, or einsum for efficient, reliable results.
Matrix–vector products with stacks of vectors are a daily routine in NumPy-heavy code. The moment you switch to generalized ufuncs such as np.matvec, axis placement becomes the only real hurdle: your matrix has two core axes, your vectors have one, and the function’s defaults assume that vectors live on the last axis. If your data does not follow that convention, you need either a harmless layout shuffle or an explicit axes mapping.
Example
The task: multiply a single 3×2 matrix by a 4×5 grid of 2D vectors stored as a 2×4×5 array, and get a 3×4×5 result.
import numpy as np
A = np.random.rand(3, 2)            # 3x2 matrix
W = np.random.rand(2, 4, 5)         # stack of 4x5 vectors, each of length 2
# matvec with lightweight layout shuffles
Y1 = np.matvec(A, W.transpose(1, 2, 0)).transpose(2, 0, 1)
# einsum equivalent
Y2 = np.einsum('ij,j...->i...', A, W)
# matvec with explicit axes mapping (no transposes)
Y3 = np.matvec(A, W, axes=[(0, 1), 0, 0])
# shape-agnostic variant using moveaxis
Y4 = np.moveaxis(np.matvec(A, np.moveaxis(W, 0, -1)), -1, 0)
# sanity checks
print(Y1.shape)  # (3, 4, 5)
print(Y2.shape)  # (3, 4, 5)
print(Y3.shape)  # (3, 4, 5)
print(Y4.shape)  # (3, 4, 5)
print(np.allclose(Y1, Y2))  # True
print(np.allclose(Y3, Y2))  # True
print(np.allclose(Y4, Y2))  # True
# non-vectorized reference for the pre-transpose intermediate
check = np.matvec(A, W.transpose(1, 2, 0))
ref = [[A @ W[:, r, c] for c in range(5)] for r in range(4)]
print(np.allclose(check, ref))  # True
What’s really going on
np.matvec is a generalized ufunc that consumes a matrix and a vector and produces a vector. By default, it expects the vector dimension on the last axis of its second input and writes the resulting vector onto the last axis of the output. That default is convenient if your data is already in “...×vector” form. But if your vectors sit on the first axis, as with a 2×4×5 layout, you either move that axis to the end, call the ufunc, and move the result back, or you describe the mapping explicitly via the axes argument.
The light-bulb moment is to separate the core axes from the batch axes. For a 3×2 matrix and 2×4×5 vectors, the core axes are (0,1) in the matrix and 0 in the vectors. The rest, here 4×5, are batch dimensions that simply iterate the operation. Without instruction, the ufunc uses the last axis for vectors; with instruction, you choose where they are.
The fix with layout shuffles
A minimal approach is to relocate the vector axis to the end, apply np.matvec, and relocate the result’s vector axis to the front. transpose and moveaxis only change stride metadata and don’t move the underlying data, so this is basically free in practice. That is exactly what happens in Y1 and Y4. The moveaxis formulation is more shape-agnostic if you don’t want to hardcode permutations.
The fix with axes mapping
If you prefer to be explicit, the axes keyword lets you state the three roles directly: where the matrix axes are in the first input, where the vector axis is in the second input, and where the output vector axis should go. For the shapes above, axes=[(0, 1), 0, 0] means the first input’s core matrix axes are (0, 1), the second input’s vector axis is 0, and the output vector axis is placed at position 0. The result is a 3×4×5 array, exactly aligned with the desired layout.
Equivalent einsum
The einsum expression 'ij,j...->i...' encodes the same idea: multiply along the j dimension, keep batch dimensions as ..., and emit the output vector along i. It is concise and easy to generalize, and in this scenario it produces the same shape and values as np.matvec. On measured shapes 3×2 and 2×4×5, timings are almost the same, with less than 0.1% difference, and behavior across sizes can shift: einsum can come out slightly ahead as sizes increase, then np.matvec can win for large data. Both approaches are viable.
Why it matters
Getting axis alignment right avoids silent shape mismatches and hard-to-read transpositions scattered through the code. When you rely on broadcasting a single matrix over many vectors, as in a 3×2 matrix applied to a 2×4×5 stack, your intent stays clear if you either codify it via axes or wrap the tiny set of moves in one place. It also helps you maintain predictable output layouts for downstream pipelines and tests.
Takeaways
If your vectors are not on the last axis, either move that axis to the end and then move the result back, or tell np.matvec where everything is using axes. When you want a 3×4×5 output from a 3×2 matrix and 2×4×5 vectors, use either of these compact patterns. For explicit mapping, np.matvec(A, W, axes=[(0, 1), 0, 0]) reads cleanly and avoids extra shuffles. For a concise universal form, np.einsum('ij,j...->i...', A, W) is equivalent. Both validate with np.allclose and, under typical shapes here, perform similarly. Choose the one that keeps your codebase more readable and your data layouts consistent.