2025, Oct 25 03:00

Block GMRES Explained: Correct QR-Based Least-Squares for Multiple Right-Hand Sides in Python

Fix block GMRES for multiple right-hand sides: use QR of the residual to build the least-squares, seed block Arnoldi, and avoid stagnation and divergence.

Block GMRES looks deceptively close to the single‑RHS GMRES: same Krylov subspace idea, same least‑squares step, similar restart strategy. The catch is in how the initial residual is embedded in the small least‑squares problem. If that piece is off, the method can stagnate or even diverge for multiple right‑hand sides. Below is a minimal, working walkthrough that shows the trap and the exact correction.

Problem setup and a minimal GMRES baseline

We’ll start with a compact restarted GMRES for a single right‑hand side as a reference. It uses the Arnoldi process and a small least‑squares solve, as described in standard references.

import numpy as np
def gmres_cycle(M, rhs, x_init=None, eps=1e-5, max_steps=100, krylov_dim=30):
    n = len(rhs)
    sol = np.zeros(n) if x_init is None else x_init.copy()
    res = rhs - M @ sol
    beta0 = np.linalg.norm(res)
    if beta0 < eps:
        return sol
    for outer_k in range(0, max_steps, krylov_dim):
        basis = np.zeros((n, krylov_dim + 1))
        hess = np.zeros((krylov_dim + 1, krylov_dim))
        basis[:, 0] = res / beta0
        for j in range(krylov_dim):
            wv = M @ basis[:, j]
            for i in range(j + 1):
                hess[i, j] = np.dot(basis[:, i], wv)
                wv -= hess[i, j] * basis[:, i]
            hess[j + 1, j] = np.linalg.norm(wv)
            if hess[j + 1, j] < 1e-14:
                break
            basis[:, j + 1] = wv / hess[j + 1, j]
        g = np.zeros(j + 2)
        g[0] = beta0
        Hsub = hess[:j + 2, :j + 1]
        coef, *_ = np.linalg.lstsq(Hsub, g, rcond=None)
        sol += basis[:, :j + 1] @ coef
        res = rhs - M @ sol
        beta0 = np.linalg.norm(res)
        if beta0 < eps:
            return sol
    return sol

Naive block GMRES and where it goes wrong

A straightforward “port” to multiple right‑hand sides replaces the vector residual by a residual block and the Arnoldi by block Arnoldi. The least‑squares is solved in the Frobenius norm. The following two pieces define a block Arnoldi routine and a restarted block GMRES that mirrors the single‑RHS structure.

def arnoldi_block(M, W0, mstep):
    n, blk = W0.shape
    W = np.zeros((n, blk * (mstep + 1)))
    T = np.zeros((blk * (mstep + 1), blk * mstep))
    Q, _ = np.linalg.qr(W0, mode='reduced')
    W[:, :blk] = Q
    for j in range(mstep):
        Bj = W[:, j * blk : (j + 1) * blk]
        Zj = M @ Bj
        for i in range(j + 1):
            Bi = W[:, i * blk : (i + 1) * blk]
            Tij = Bi.T @ Zj
            T[i * blk : (i + 1) * blk, j * blk : (j + 1) * blk] = Tij
            Zj -= Bi @ Tij
        Qj, Rj = np.linalg.qr(Zj, mode='reduced')
        W[:, (j + 1) * blk : (j + 2) * blk] = Qj
        T[(j + 1) * blk : (j + 2) * blk, j * blk : (j + 1) * blk] = Rj
    return W, T
def gmres_block_cycle(M, RHS, X_init=None, eps=1e-5, max_steps=100, krylov_dim=30):
    n, blk = RHS.shape
    X = np.zeros((n, blk)) if X_init is None else X_init.copy()
    RES = RHS - M @ X
    if np.linalg.norm(RES, ord='fro') < eps:
        return X
    for outer_k in range(0, max_steps, krylov_dim):
        W, T = arnoldi_block(M, RES, krylov_dim)
        G = np.zeros(((krylov_dim + 1) * blk, blk))
        G[:blk, :] = np.linalg.norm(RES, axis=0).reshape(-1, 1) * np.eye(blk)
        Tsub = T[:(krylov_dim + 1) * blk, :krylov_dim * blk]
        Ycoef, *_ = np.linalg.lstsq(Tsub, G, rcond=None)
        X += W[:, :krylov_dim * blk] @ Ycoef
        RES = RHS - M @ X
        if np.linalg.norm(RES, ord='fro') < eps:
            return X
    return X

The issue is subtle and has nothing to do with flipping the update sign. In the single‑RHS case, the least‑squares right‑hand side is a scalar β times the first canonical basis vector. Generalizing that incorrectly to a diagonal built from the columnwise norms of the residual block creates an inconsistent small problem. The optimizer then minimizes the wrong quantity, which explains why the residual can increase or stagnate as soon as there is more than one right‑hand side.

The crux: QR of the residual block drives the small problem

Block GMRES requires the QR factorization of the current residual block. Compute Q0, R0 = qr(R), use Q0 as the initial block for the Arnoldi process, and inject R0 directly into the top-left block of the least‑squares right‑hand side. That is the key difference from the single‑RHS case, where the scalar β plays the same role.

Fixed block GMRES with the correct least‑squares setup

The following version performs exactly that. The rest of the algorithm remains unchanged.

def gmres_block_cycle_fixed(M, RHS, X_init=None, eps=1e-5, max_steps=100, krylov_dim=30):
    n, blk = RHS.shape
    X = np.zeros((n, blk)) if X_init is None else X_init.copy()
    RES = RHS - M @ X
    if np.linalg.norm(RES, ord='fro') < eps:
        return X
    for outer_k in range(0, max_steps, krylov_dim):
        Q0, R0 = np.linalg.qr(RES, mode='reduced')
        W, T = arnoldi_block(M, Q0, krylov_dim)
        G = np.zeros(((krylov_dim + 1) * blk, blk))
        G[:blk, :blk] = R0
        Tsub = T[:(krylov_dim + 1) * blk, :krylov_dim * blk]
        Ycoef, *_ = np.linalg.lstsq(Tsub, G, rcond=None)
        X += W[:, :krylov_dim * blk] @ Ycoef
        RES = RHS - M @ X
        if np.linalg.norm(RES, ord='fro') < eps:
            return X
    return X

Reproducible test scaffold

This tiny setup mirrors the scenario discussed above. It’s useful for sanity checks and for comparing the single‑RHS and block variants on the same matrix.

np.random.seed(42)
n = 20
A = np.diag(np.linspace(1, 100, n)) + 0.001 * np.random.randn(n, n)
A = (A + A.T) / 2
b = np.random.randn(n, 1)
x_ref = gmres_cycle(A, b[:, 0], krylov_dim=10)
X_blk = gmres_block_cycle_fixed(A, b, krylov_dim=10)

Why this detail matters

The block variant minimizes a Frobenius norm over a block Krylov subspace. The small least‑squares problem must encode the correct residual geometry at restart. The QR of the residual block captures that geometry via R0, and feeding R0 to the right‑hand side of the small system aligns the minimization with the true objective. Substituting diagonal norms looks innocuous but breaks that alignment.

Takeaways

If block GMRES fails for multiple right‑hand sides while the single‑RHS version behaves, check the construction of the small least‑squares problem at restart. Compute Q0, R0 = qr(R), seed Arnoldi with Q0, and set the leading block of the right‑hand side to R0. With that in place, the update direction is already correct, and there is no need to flip signs. When comparing Python and C++ implementations, keep the matrix operations as in the code above; the method is iterative by design, and explicit loops for the Arnoldi steps are entirely appropriate.

The article is based on a question from StackOverflow by Pierre Beaujean and an answer by Pierre Beaujean.