2025, Dec 15 23:00
Why a NumPy Quintic Hermite Interpolator Beats SciPy BPoly.from_derivatives — and When to Use CubicHermiteSpline
Benchmark of NumPy quintic Hermite vs SciPy BPoly.from_derivatives shows why it’s slower and when to prefer CubicHermiteSpline. Pick the right API for speed.
When a pure NumPy implementation of quintic Hermite interpolation outpaces SciPy’s BPoly.from_derivatives by a wide margin, it’s natural to question where the gap comes from and how to pick the right API. Below is a compact guide that reproduces the comparison, explains the root cause based on maintainers’ input, and points to a pragmatic path forward.
Reproducing the comparison
The following script constructs a 10 000-point sinusoid with analytical first and second derivatives, builds an interpolant with SciPy’s BPoly.from_derivatives, then does the same with a hand-rolled NumPy quintic Hermite implemented as per-interval 6×6 solves. It validates that both interpolants agree at the sample locations and times coefficient construction and evaluation. Variable and function names are chosen for clarity, but the computational flow mirrors the original benchmark.
"""
Sinusoidal interpolation on 10 000 points: SciPy BPoly.from_derivatives vs pure NumPy quintic Hermite.
Measures coefficient-build time and evaluation time, and checks numerical agreement.
"""
import sys
import time
import numpy as np
import scipy
from scipy.interpolate import BPoly
# Environment report
print(f"Python version: {sys.version.split()[0]}")
print(f"NumPy version: {np.__version__}")
print(f"SciPy version: {scipy.__version__}")
# Grid and exact values/derivatives over [0, 2π]
n_pts = 10_000
grid_x = np.linspace(0.0, 2 * np.pi, n_pts)
f_val = np.sin(grid_x) # f(x)
d1_val = np.cos(grid_x) # f'(x)
d2_val = -np.sin(grid_x) # f''(x)
# === SciPy: BPoly.from_derivatives ===
stacked_y = np.column_stack((f_val, d1_val, d2_val))
t_start = time.perf_counter()
poly_scipy = BPoly.from_derivatives(grid_x, stacked_y)
t_end = time.perf_counter()
scipy_build = t_end - t_start
# Evaluate at the original nodes
t_start = time.perf_counter()
y_bp = poly_scipy(grid_x)
t_end = time.perf_counter()
scipy_eval = t_end - t_start
# === Pure NumPy: quintic Hermite (per-interval linear solves) ===
def assemble_quintic(xk, f, fp, fpp):
"""
Compute quintic Hermite coefficients on intervals [xk[i], xk[i+1]].
Returns an array of shape (6, len(xk)-1) with polynomial coefficients per interval.
"""
m = len(xk) - 1
params = np.zeros((6, m))
for i in range(m):
h = xk[i + 1] - xk[i]
M = np.array([
[1, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0],
[0, 0, 2, 0, 0, 0],
[1, h, h**2, h**3, h**4, h**5],
[0, 1, 2*h, 3*h**2, 4*h**3, 5*h**4],
[0, 0, 2, 6*h, 12*h**2, 20*h**3],
])
rhs = np.array([f[i], fp[i], fpp[i], f[i + 1], fp[i + 1], fpp[i + 1]])
params[:, i] = np.linalg.solve(M, rhs)
return params
def eval_quintic(xk, coeff, xq):
"""
Evaluate the piecewise quintic at query points xq using precomputed coeff.
xk: breakpoints (n,), coeff: (6, n-1), xq: array-like points.
"""
idx = np.searchsorted(xk, xq) - 1
idx = np.clip(idx, 0, len(xk) - 2)
dx = xq - xk[idx]
out = np.zeros_like(xq)
for j in range(6):
out += coeff[j, idx] * dx**j
return out
# Build custom coefficients
t_start = time.perf_counter()
quintic_coef = assemble_quintic(grid_x, f_val, d1_val, d2_val)
t_end = time.perf_counter()
np_build = t_end - t_start
# Evaluate at the nodes
t_start = time.perf_counter()
y_np = eval_quintic(grid_x, quintic_coef, grid_x)
t_end = time.perf_counter()
np_eval = t_end - t_start
# Agreement check at the grid
assert np.allclose(y_bp, y_np, atol=1e-12), "NumPy and SciPy interpolants differ"
print("Custom NumPy interpolation matches SciPy BPoly for 10 000 points.")
# Timing report
print(f"SciPy coeff time: {scipy_build:.6f} s")
print(f"SciPy eval time : {scipy_eval:.6f} s")
print(f"Custom coeff time: {np_build:.6f} s")
print(f"Custom eval time : {np_eval:.6f} s")
# Optional: visualization (not required for timing)
import matplotlib.pyplot as plt
plt.plot(grid_x, y_bp, label='SciPy BPoly')
plt.plot(grid_x, y_np, '--', label='Custom NumPy')
plt.legend()
plt.title('Sinusoidal interpolation: SciPy vs NumPy Quintic')
plt.xlabel('x')
plt.ylabel('sin(x)')
plt.show()
On one recorded run, the output showed that the custom NumPy path produced the same values at the nodes and was faster in coefficient construction. The test setup used Python 3.11.7, NumPy 2.1.3, and SciPy 1.15.2.
What actually causes the performance gap
BPoly.from_derivatives has not seen much work after an initial implementation (which, back in the day, replaced an order of magnitude slower interpolator). This is for various reasons, one of which it is mostly superseded by CubicHermiteSpline. We barely saw it used in the wild, hence had little incentive to work on it. If you are using it or would use it if it were faster, please don't hesitate sending a pull request. It will be gladly reviewed.
The essential point is that BPoly.from_derivatives is a generic path that has not been actively optimized, in part because it is largely superseded by CubicHermiteSpline and is rarely exercised. The observed gap is therefore a product of engineering priorities rather than a limit of SciPy’s interpolation stack in general.
An additional observation from the benchmark author is that further vectorizing the per-interval coefficient builder can remove the explicit loop, yet the improvement was negligible on tested systems. This does not change the headline conclusion about relative performance for this workload.
Practical direction: use the optimized path
When your problem fits, the more specialized and maintained CubicHermiteSpline is the modern choice over BPoly.from_derivatives. It is specifically called out as the successor and has received more attention. The following snippet shows how to construct and evaluate it on the same sinusoidal dataset using the function values and first derivatives.
from scipy.interpolate import CubicHermiteSpline
# Build cubic Hermite spline from values and first derivatives
start = time.perf_counter()
chs = CubicHermiteSpline(grid_x, f_val, d1_val)
stop = time.perf_counter()
chs_build = stop - start
# Evaluate at the original nodes
start = time.perf_counter()
chs_y = chs(grid_x)
stop = time.perf_counter()
chs_eval = stop - start
print(f"CubicHermiteSpline coeff time: {chs_build:.6f} s")
print(f"CubicHermiteSpline eval time : {chs_eval:.6f} s")
The experience reported alongside the comparison is consistent with this guidance.
This is the correct answer. I further optimized the numpy-numba implementation, beyond what @JeromeRichard did in his answer. With significant effort, I was in the same ballpark as CubicHermiteSpline, yet CubicHermiteSpline won on my machine. I would not recommend to update the more generic from_derivatives function. It might be nice to update the scipy docs, in particular a pointer towards the more optimized flavors when viewing more generic interpolation options. Well done scipy team!
Why this matters for engineers
Performance conclusions depend on the exact API surface you exercise. A generic routine that aims for broad capability may not be tuned like a focused spline constructor, and a lean NumPy implementation can easily beat unoptimized code when the math is simple and predictable. Knowing which functions in a library are actively maintained and which ones are legacy or superseded directly impacts the outcome of your benchmarks and, ultimately, your production choices.
Takeaways and advice
If you see a large disparity between a hand-rolled NumPy interpolator and SciPy’s BPoly.from_derivatives, you are likely observing the lack of recent optimization work in that specific function. Prefer CubicHermiteSpline when it matches your constraints. If your workflow truly requires BPoly.from_derivatives and you need more speed, consider contributing improvements upstream; maintainers are open to reviewing pull requests. For local tuning, straight NumPy can be effective, and deeper micro-optimizations beyond a clear implementation may not move the needle much, depending on the system.