2025, Oct 07 21:00

Control smoothing in SciPy's make_smoothing_spline: scale the GCV lambda with a simple bias factor

Learn how to bias SciPy's make_smoothing_spline by scaling the GCV-chosen lambda. Reuse GCV, adjust smoothness vs fidelity with a simple factor and code.

Controlling how aggressively a smoothing spline fits the data is often essential. SciPy’s make_smoothing_spline() selects the regularization parameter lam automatically (via the GCV criterion), but the function does not expose that value for post-hoc adjustment. If you want to bias the result toward more smoothing or toward higher fidelity, you need a way to nudge lam without abandoning the automatic selection.

Problem overview

With make_smoothing_spline(), the GCV-based lam is computed internally and not returned. This means you cannot multiply it by a factor after the fact to shift the balance between smoothness and data adherence.

from scipy.interpolate import make_smoothing_spline
# GCV chooses lam internally
spline_obj = make_smoothing_spline(x_values, y_values)
# No access to the automatically determined `lam` here to scale it
# (it is not saved or returned by make_smoothing_spline)

Looking at the source confirms that lam is not stored anywhere you can retrieve after a call, so there is no public way to scale it afterwards. If you want to keep the GCV criterion yet steer the outcome, you need to do the scaling before solving the system.

Why this happens

The function computes a GCV-optimal lam internally and proceeds directly to solve for the spline coefficients. Since the value is neither stored on the returned object nor exposed as a return value, you cannot intercept it. In other words, the behavior is by design: lam is treated as an internal detail.

Solution: reimplement with a bias factor

The practical workaround is to reimplement make_smoothing_spline and introduce a multiplicative factor applied to the automatically estimated lam. A factor above 1 skews the result toward stronger smoothing, while a factor between 0 and 1 pushes the fit to hew closer to the data. A factor of 1 leaves the behavior unchanged. The implementation below preserves the original logic, including the GCV-based estimation, and multiplies the resulting lam by the chosen factor.

import numpy as np
from scipy.interpolate._bsplines import _coeff_of_divided_diff as _dd_coef, _compute_optimal_gcv_parameter as _gcv_param
from scipy._lib._array_api import array_namespace as _xp_ns
from scipy.interpolate import BSpline as _BS
from scipy.linalg import solve_banded as _solve_banded
from numpy.core.multiarray import normalize_axis_index as _norm_axis
def fit_spline_gcv_tilt(x_coords, y_data, smooth_bias=1.0, weights=None, alpha=None, *, axis=0):
    """
    Build a cubic smoothing spline using the same logic as SciPy's implementation,
    but multiply the automatically estimated GCV parameter by `smooth_bias`.
    Inputs:
    - smooth_bias >= 0. Values > 1 favor smoother output; values < 1 favor fidelity.
    - alpha: if provided, acts as the regularization parameter directly (non-negative).
    """
    xp_mod = _xp_ns(x_coords, y_data)
    x_arr = np.ascontiguousarray(x_coords, dtype=float)
    y_arr = np.ascontiguousarray(y_data, dtype=float)
    if smooth_bias < 0:
        raise ValueError("`smooth_bias` must be non-negative")
    if any(x_arr[1:] - x_arr[:-1] <= 0):
        raise ValueError("`x` must be strictly increasing")
    if x_arr.ndim != 1 or x_arr.shape[0] != y_arr.shape[axis]:
        raise ValueError(f"`x` must be 1D and {x_arr.shape = } == {y_arr.shape = }")
    if weights is None:
        w_vec = np.ones(len(x_arr))
    else:
        w_vec = np.ascontiguousarray(weights)
        if any(w_vec <= 0):
            raise ValueError("Invalid vector of weights")
    knts = np.r_[[x_arr[0]] * 3, x_arr, [x_arr[-1]] * 3]
    n_pts = x_arr.shape[0]
    if n_pts <= 4:
        raise ValueError("`x` and `y` length must be at least 5")
    axis_idx = _norm_axis(axis, y_arr.ndim)
    y_arr = np.moveaxis(y_arr, axis_idx, 0)
    y_tail_shape = y_arr.shape[1:]
    if y_tail_shape != ():
        y_arr = y_arr.reshape((n_pts, -1))
    # Design matrix in the B-spline basis (banded form)
    dm = _BS.design_matrix(x_arr, knts, 3)
    band_A = np.zeros((5, n_pts))
    for ii in range(1, 4):
        band_A[ii, 2:-2] = dm[ii: ii - 4, 3: -3][np.diag_indices(n_pts - 4)]
    band_A[1, 1] = dm[0, 0]
    band_A[2, :2] = ((x_arr[2] + x_arr[1] - 2 * x_arr[0]) * dm[0, 0],
                     dm[1, 1] + dm[1, 2])
    band_A[3, :2] = ((x_arr[2] - x_arr[0]) * dm[1, 1], dm[2, 2])
    band_A[1, -2:] = (dm[-3, -3], (x_arr[-1] - x_arr[-3]) * dm[-2, -2])
    band_A[2, -2:] = (dm[-2, -3] + dm[-2, -2],
                      (2 * x_arr[-1] - x_arr[-2] - x_arr[-3]) * dm[-1, -1])
    band_A[3, -2] = dm[-1, -1]
    band_B = np.zeros((5, n_pts))
    band_B[2:, 0] = _dd_coef(x_arr[:3]) / w_vec[:3]
    band_B[1:, 1] = _dd_coef(x_arr[:4]) / w_vec[:4]
    for jj in range(2, n_pts - 2):
        band_B[:, jj] = (x_arr[jj + 2] - x_arr[jj - 2]) * _dd_coef(x_arr[jj - 2: jj + 3]) / w_vec[jj - 2: jj + 3]
    band_B[:-1, -2] = -_dd_coef(x_arr[-4:]) / w_vec[-4:]
    band_B[:-2, -1] = _dd_coef(x_arr[-3:]) / w_vec[-3:]
    band_B *= 6
    # Determine regularization and apply bias
    if alpha is None:
        lam_val = _gcv_param(band_A, band_B, y_arr, w_vec)
        lam_val *= smooth_bias
    elif alpha < 0.0:
        raise ValueError("Regularization parameter should be non-negative")
    else:
        lam_val = alpha
    # Solve in the basis of natural splines
    if np.ndim(lam_val) == 0:
        coef = _solve_banded((2, 2), band_A + lam_val * band_B, y_arr)
    elif np.ndim(lam_val) == 1:
        coef = np.empty((n_pts, lam_val.shape[0]))
        for ii in range(lam_val.shape[0]):
            coef[:, ii] = _solve_banded((2, 2), band_A + lam_val[ii] * band_B, y_arr[:, ii])
    else:
        raise RuntimeError("Internal error, please report it to SciPy developers.")
    coef = coef.reshape((coef.shape[0], *y_tail_shape))
    p0, p1 = coef[0:1, ...], coef[1:2, ...]
    pm0, pm1 = coef[-1:-2:-1, ...], coef[-2:-3:-1, ...]
    coeff_full = np.r_[p0 * (knts[5] + knts[4] - 2 * knts[3]) + p1,
                       p0 * (knts[5] - knts[3]) + p1,
                       coef[1: -1, ...],
                       pm0 * (knts[-4] - knts[-6]) + pm1,
                       pm0 * (2 * knts[-4] - knts[-5] - knts[-6]) + pm1]
    knts_xp, coeff_full_xp = xp_mod.asarray(knts), xp_mod.asarray(coeff_full)
    return _BS.construct_fast(knts_xp, coeff_full_xp, 3, axis=axis_idx)

This implementation uses functions from SciPy’s private API and was tested successfully on SciPy v1.16.1. If you need to retrieve the effective lam alongside the spline, a direct pattern is to modify the return to include the value and unpack it at the call site.

# Example pattern if you decide to also return `lam_val` from the function
# return _BS.construct_fast(knts_xp, coeff_full_xp, 3, axis=axis_idx), lam_val
# Then call: spline_obj, lam_used = fit_spline_gcv_tilt(...)

Using the bias in practice

If smooth_bias equals 1, the behavior mirrors the default. Increasing the factor pushes toward smoother curves; decreasing it steers the fit toward the observations. You keep the GCV-driven choice, merely scaling it before solving.

# More smoothing than default
s_smooth = fit_spline_gcv_tilt(x_values, y_values, smooth_bias=2.0)
# More fidelity to data than default
s_faithful = fit_spline_gcv_tilt(x_values, y_values, smooth_bias=0.5)

Why this detail matters

Even when automatic selection works well, production workloads often require explicit control over smoothness to meet downstream constraints, stabilize derivatives, or enforce consistency across datasets. A small, transparent knob lets you retain the convenience of GCV while making the trade-off explicit and repeatable.

Takeaways

make_smoothing_spline does not expose the automatically chosen lam, so you cannot adjust it after calling the function. If you need this capability, replicate the logic and multiply the GCV-optimal parameter by a factor. Values above one prefer smoothness; values below one prefer fidelity. The approach shown relies on SciPy’s private API and was tested on SciPy 1.16.1, so treat it accordingly in your environment.

The article is based on a question from StackOverflow by David Pagnon and an answer by David Pagnon.