2026, Jan 12 19:00

Efficient Nth-Term Algorithms for Higher-Order Fibonacci (Tribonacci, Tetranacci, k-order): Linear vs Matrix vs Diagonalization

Learn how to compute the Nth term of higher-order Fibonacci (k-nacci) fast: compare O(n) windowed update, O(k^3 log N) matrix power, and faster diagonalization.

Computing the Nth term of a higher-order Fibonacci sequence efficiently is tricky once you go beyond classical Fibonacci. Tribonacci, Tetranacci, Pentanacci, Hexanacci and their general k-order variants grow fast, and the familiar fast-doubling tricks for Fibonacci-2 do not directly carry over. Below is a practical guide that shows what actually works, why the usual matrix exponentiation quickly hits its k³ wall, and how to push further with matrix diagonalisation.

Problem setup

Consider the k-order sequence defined by A0 = 1, Ai = 2i−1 for i in [1, k], Ak+1 = 2k − 1, and for i ≥ k+1 the recurrence Ai+1 = 2·Ai − Ai−k−1. This generalises Fibonacci, Tribonacci, Tetranacci, and so on. The goal is to compute the Nth term efficiently for large N and moderate k (think k between 25 and 100, and N up to 1,048,576).

Baseline code that exposes the bottlenecks

A straight linear generator uses the windowed recurrence. It runs in O(n) time and is surprisingly competitive because each step is very cheap.

from typing import List
Vec = List[int]
def kseq_linear(total: int, k: int) -> Vec:
    if total <= k + 1:
        return [1] + [1 << i for i in range(total - 1)]
    last_idx = total - 1
    out = [1] + [1 << i for i in range(k + 1)] + [0] * (last_idx - k - 1)
    out[start := k + 1] -= 1
    for i_cur, i_left in zip(range(start, last_idx), range(1, last_idx - k)):
        out[i_cur + 1] = (out[i_cur] << 1) - out[i_left]
    return out

Matrix exponentiation by squaring also applies. The k×k transition matrix raises in O(log2 N) squarings, but each multiplication is O(k³). For moderate k and large N, this O(k³ log N) cost grows fast enough that it can be beaten by the linear method.

from typing import List, Dict
Flat = List[int]
_TRANS_CACHE: Dict[int, Flat] = {}
_EYE_CACHE: Dict[int, Flat] = {}
def build_kstep_matrix(dim: int) -> Flat:
    cached = _TRANS_CACHE.get(dim)
    if cached is not None:
        return cached
    m = [1] * dim + [0] * (dim * (dim - 1))
    for r in range(1, dim):
        m[r * dim + r - 1] = 1
    _TRANS_CACHE[dim] = m
    return m
def eye_flat(dim: int) -> Flat:
    cached = _EYE_CACHE.get(dim)
    if cached is not None:
        return cached
    vals = [0] * (dim * dim)
    for i in range(dim):
        vals[i * dim + i] = 1
    _EYE_CACHE[dim] = vals
    return vals
def matmul_flat(a: Flat, b: Flat, n: int) -> Flat:
    res = [0] * (n * n)
    for row in range(0, n * n, n):
        for kcol in range(n):
            acc = a[row + kcol]
            for col in range(n):
                res[row + col] += b[kcol * n + col] * acc
    return res
def matpow_flat(base: Flat, exp: int, n: int) -> Flat:
    acc = eye_flat(n)
    cur = base
    p = exp
    while p:
        if p & 1:
            acc = matmul_flat(acc, cur, n)
        cur = matmul_flat(cur, cur, n)
        p >>= 1
    return acc
def kseq_nth_power(n: int, k: int) -> int:
    return matpow_flat(build_kstep_matrix(k), n, k)[0]

There is a more compact way to update the state when you already have the k×k state matrix for Hexanacci-like systems: shift rows down and build the top row from the first element. This keeps the per-step state to k numbers.

import numpy as np
def step_state(block: np.ndarray) -> np.ndarray:
    first = block[0, 0]
    return np.concatenate([[[first + x for x in block[0, 1:]] + [first]], block[:-1]])

The practical snag remains: exponentiation by squaring touches full k×k matrices; each multiply is k³ multiplications and k³−k² additions per iteration, while the linear generator does only a couple of scalar ops per term. As k grows, the matrix method’s constant factors dominate despite its logarithmic iteration count.

Why the classical shortcuts don’t generalise cleanly

Fast doubling and compact Binet-like formulas work elegantly for Fibonacci-2. For higher orders, only the cubic x³ − x² − x − 1 = 0 (Tribonacci) and quartic x⁴ − x³ − x² − x − 1 = 0 (Tetranacci) admit algebraic solutions in a way suited to such shortcuts. The quintic x⁵ − x⁴ − x³ − x² − x − 1 = 0 does not yield to sympy, so those tricks do not scale to k ≥ 5.

Solution: use matrix diagonalisation

The transition matrix for the k-order sequence can be diagonalised. The matrix is square and constructed so that rows and columns are linearly independent, i.e., it has full rank. Moreover, one can show M @ M.T equals M.T @ M, so M is a normal matrix; by the Spectral theorem it is diagonalisable.

Let M = Q @ L @ inv(Q), where columns of Q are eigenvectors and L is diagonal with eigenvalues. Then Mⁿ = Q @ Lⁿ @ inv(Q). This avoids repeated dense matrix multiplies in the power and collapses the cost to exponentiating diagonal entries and two dense multiplies with Q and inv(Q).

import numpy as np
def build_kstep_numpy(dim: int) -> np.ndarray:
    M = np.zeros((dim, dim), dtype=np.float64)
    M[0] = 1
    for i in range(1, dim):
        M[i, i - 1] = 1
    return M
def nth_via_diagonalisation(n: int, k: int):
    M = build_kstep_numpy(k)
    eigvals, eigvecs = np.linalg.eig(M)
    inv_q = np.linalg.inv(eigvecs)
    Dn = np.diag(eigvals ** n)
    Mn = eigvecs @ Dn @ inv_q
    return Mn[0, 0]

This computes Mⁿ in O(k log n + k³) for fixed-size floating-point arithmetic because Lⁿ is an element-wise power on a diagonal and dominates the logarithmic part. In practice, eigenvalues and eigenvectors may be complex, and fixed-size floating-point precision is not reliable for very large n. Still, for k < 100 this is a very good approximation in the intended regime, with the caveat that L**n breaks down for sufficiently large n, especially with complex eigenvalues.

Computing only the needed entry

We do not need the full Mⁿ, only Mⁿ[0, 0]. Using the diagonal form, this entry can be obtained without forming the whole product. If L holds eigenvalues and Q the eigenvectors, and P = inv(Q), then for any indices i, j:

Mⁿ[i, j] = np.dot(Q[i] * L**n, P[:, j]) = np.sum(Q[i] * P[:, j] * L**n)

Thus, for the Nth term we can compute just the top-left entry as a weighted sum. This reduces the last multiplication to a dot product.

def nth_entry_via_diag(n: int, k: int):
    M = build_kstep_numpy(k)
    eigvals, eigvecs = np.linalg.eig(M)
    inv_q = np.linalg.inv(eigvecs)
    weights = eigvecs[0] * inv_q[:, 0]
    return np.sum(weights * (eigvals ** n))

For fixed-size floating point, this keeps the work to O(k log n + k³), where computing eigendecomposition and inverting Q are the cubic parts, and powering the k eigenvalues is the logarithmic part. With arbitrary-precision arithmetic the cost model changes, as discussed next.

Notes on arbitrary-large numbers

F(n) in these sequences has O(n) bits. With arbitrary-precision arithmetic, basic operations are not constant-time. If addition is O(n) and multiplication benefits from fast algorithms, multiplication can be as fast as O(n log n). CPython ints use Karatsuba for large operands, and decimal.Decimal uses the Number Theoretic Transform. Under such models, the diagonalisation approach with high precision runs in about O(k n log² n + n k³), where the eigen computations and powers of large values dominate. The same observation applies to the linear and matrix methods above: numbers get huge, and arithmetic cost matters.

Why this matters

For moderate k and large N, the usual matrix exponentiation by squaring hits an O(k³) per-step wall, and the simple windowed recurrence can win despite its O(n) steps. Diagonalisation turns the k×k power into k independent scalar powers, then two dense transforms. It is asymptotically faster than repeated dense multiplies for fixed-size numbers and, even with big integers or high-precision floats, gives a strong approach when you only need a single entry of Mⁿ. It also clarifies why fast-doubling-like tricks do not extend cleanly beyond k = 4 in this family.

Takeaways

If you need the Nth term of a k-order Fibonacci-like sequence, you can generate all terms in O(n) time with a cheap windowed update when k is moderate, and it will be hard to beat in practice for large N. If you need the Nth term without building the whole sequence and can afford floating-point approximations, diagonalising the transition matrix and computing only the top-left entry provides a faster route. For arbitrary-precision arithmetic, plan for the true cost of big-number operations and keep the computation to the minimal objects you need, namely a single dot product Q[0] ∘ P[:,0] scaled by L**n.