2025, Oct 22 17:00

Nonlinear Curve Fitting in SciPy: Power-Law vs Box-Cox, Why Models Diverge and How p0 + a Jacobian Fix It

Learn why algebraically equivalent power-law and Box-Cox models diverge in SciPy curve_fit. Get practical p0 and Jacobian guidance for stable nonlinear fits.

Nonlinear curve fitting can feel treacherous when algebraically equivalent models behave differently under the optimizer. A common case is power-law style regression of y = a + b * x ** c versus a Box-Cox-like reparameterization. The question is which formulation gives the best fit. The short answer from a practical optimization perspective: without a good initial estimate and a Jacobian, you are testing the optimizer’s luck, not the model. With both provided, equivalent formulations converge equivalently.

Problem setup

Two mathematically equivalent parameterizations are compared. The first is y = a + b * x ** c. The second uses d + e * (x ** f - 1) / f with the identity a = d - e / f, b = e / f, c = f. Observations included that the first variant exceeded the default maxfev limit and was sensitive to feature scaling, while the second converged with defaults, produced a better fit, and was insensitive to scaling. After scaling x to unit standard deviation, both produced the same solution.

Reproducible example that exposes the discrepancy

The following code demonstrates the behavior by fitting both forms, once on raw x and once on scaled x, and comparing MAE. The logic mirrors the setup above.

from io import StringIO

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from scipy.optimize import curve_fit
from sklearn.metrics import mean_absolute_error
from sklearn.preprocessing import StandardScaler


def base_power(x, p, q, r):
    return p + q * x ** r


def shifted_boxcox(x, u, v, w):
    return u + v * (x ** w - 1) / w


DATA_SRC = StringIO(
    """x,y
410.0,73.06578085929756
25.0,205.29417389522575
72.0,110.48653325137172
51.0,168.52111516008628
15.0,119.75684720989004
72.0,164.46280991735537
73.0,145.53126751391522
36.0,161.41429319490925
40.0,219.91735537190084
190.0,89.18717804003897
21.0,203.3350571291969
47.0,170.12964176670877
21.0,197.1020714822368
18.0,96.43526170798899
53.0,117.55060034305319
50.0,189.89358650365415
43.0,179.03132807995385
863.0,69.63888057656149
71.0,131.42730764753813
40.0,205.2892561983471
65.0,131.3857292219426
401.0,133.81511047189076
50.0,115.65603442387814
58.0,151.99074870050802
50.0,165.8640803223824
21.0,210.87942861045792
236.0,124.21734182739671
53.0,180.11451429366744
12.0,320.77043917765184
36.0,244.3526170798898
25.0,202.41568198893515
21.0,184.03895128162597
29.0,165.64724945771087
25.0,218.1818181818182
72.0,161.8457300275482
130.0,107.38232466256511
84.0,177.52397865095088
38.0,57.524112172378224
50.0,168.132777815723
25.0,202.41568198893515
21.0,244.3978260657449
48.0,168.3167133392528
200.0,122.8403797554812
37.0,167.84185838731295
83.0,173.75445583988846
13.0,315.835929660122
11.0,314.47181327653976
32.0,203.68741889215215
200.0,123.96694214876034
39.0,110.59353869271226
39.0,190.81504521023686
40.0,235.53719008264466
37.0,181.71111484409758
25.0,215.55576804863057
40.0,235.53719008264466"""
)
frame = pd.read_csv(DATA_SRC)
scaler = StandardScaler(with_mean=False)
frame["x_std"] = scaler.fit_transform(frame[["x"]])

fig, axes = plt.subplots(1, 2, figsize=(10, 6), sharey=True)
for idx, use_scaled in enumerate([False, True]):
    feat_col = "x"
    if use_scaled:
        feat_col = "x_std"

    axes[idx].scatter(frame[feat_col], frame["y"], label="Data", alpha=0.5)

    for variant_idx, (fn, style) in enumerate(
        zip([base_power, shifted_boxcox], ["--", ":"])
    ):

        theta, cov = curve_fit(
            fn,
            frame[feat_col],
            frame["y"],
            maxfev=10_000,
        )
        frame["pred"] = fn(frame[feat_col], *theta)
        mae_val = mean_absolute_error(frame["y"], frame["pred"])

        grid = np.linspace(frame[feat_col].min(), frame[feat_col].max(), 100)
        yhat_line = fn(grid, *theta)

        axes[idx].plot(
            grid,
            yhat_line,
            linestyle=style,
            label=f"Form {variant_idx + 1} (MAE: {mae_val:.2f})",
        )
    axes[idx].legend()
    axes[idx].set_title(f"Scaled: {use_scaled}")
    axes[idx].set_xlabel(feat_col)
    axes[idx].set_ylabel("y")
plt.show()

Why equivalent models diverge here

The key issue is the lack of an initial estimate p0. Without it, the optimizer starts from its own defaults. Equivalent functions can lead the solver into different parts of the parameter space, converge at different speeds, and land in different local minima. That explains why one form seemed to require a higher maxfev and react to scaling, while the other did not. When x is standardized to unit variance, both variants aligned because that scaling effectively put the search in comparable regions of the parameter space.

Make the optimizer behave: supply an initial estimate

Passing a reasonable p0 aligns the two formulations and avoids chasing poor local minima. This fit uses an explicit estimate and converges quickly.

import io

import matplotlib.pyplot as plt
import numpy as np
from scipy.optimize import curve_fit


def base_curve(x: np.ndarray, aa: float, bb: float, cc: float) -> np.ndarray:
    return aa + bb * x ** cc


def run_fit(x: np.ndarray, y: np.ndarray, guess: np.typing.ArrayLike) -> np.ndarray:
    popt, pcov, info, msg, status = curve_fit(
        f=base_curve, xdata=x, ydata=y, p0=guess, full_output=True,
    )
    if status not in {1, 2, 3, 4}:
        raise ValueError(msg)
    print(msg, f'in {info["nfev"]} calls')
    return popt


def draw_fig(
    x: np.ndarray, y: np.ndarray,
    theta_guess: np.ndarray,
    theta_fit: np.ndarray,
) -> plt.Figure:
    xgrid = np.logspace(start=1, stop=3, num=200)
    fig, ax = plt.subplots()
    ax.set_xscale('log')
    ax.scatter(x, y, label='Data')
    ax.plot(xgrid, base_curve(xgrid, *theta_guess), label='Est')
    ax.plot(xgrid, base_curve(xgrid, *theta_fit), label='Fit')
    ax.legend()
    return fig


def run_demo() -> None:
    raw = io.StringIO(
'''410.0,73.06578085929756
25.0,205.29417389522575
72.0,110.48653325137172
51.0,168.52111516008628
15.0,119.75684720989004
72.0,164.46280991735537
73.0,145.53126751391522
36.0,161.41429319490925
40.0,219.91735537190084
190.0,89.18717804003897
21.0,203.3350571291969
47.0,170.12964176670877
21.0,197.1020714822368
18.0,96.43526170798899
53.0,117.55060034305319
50.0,189.89358650365415
43.0,179.03132807995385
863.0,69.63888057656149
71.0,131.42730764753813
40.0,205.2892561983471
65.0,131.3857292219426
401.0,133.81511047189076
50.0,115.65603442387814
58.0,151.99074870050802
50.0,165.8640803223824
21.0,210.87942861045792
236.0,124.21734182739671
53.0,180.11451429366744
12.0,320.77043917765184
36.0,244.3526170798898
25.0,202.41568198893515
21.0,184.03895128162597
29.0,165.64724945771087
25.0,218.1818181818182
72.0,161.8457300275482
130.0,107.38232466256511
84.0,177.52397865095088
38.0,57.524112172378224
50.0,168.132777815723
25.0,202.41568198893515
21.0,244.3978260657449
48.0,168.3167133392528
200.0,122.8403797554812
37.0,167.84185838731295
83.0,173.75445583988846
13.0,315.835929660122
11.0,314.47181327653976
32.0,203.68741889215215
200.0,123.96694214876034
39.0,110.59353869271226
39.0,190.81504521023686
40.0,235.53719008264466
37.0,181.71111484409758
25.0,215.55576804863057
40.0,235.53719008264466''')

    with raw:
        x, y = np.loadtxt(raw, delimiter=',').T

    theta0 = (0, 1000, -0.5)
    theta = run_fit(x, y, theta0)
    print('theta =', theta)
    draw_fig(x, y, theta_guess=theta0, theta_fit=theta)
    plt.show()


if __name__ == '__main__':
    run_demo()

Both actual and predicted relative reductions in the sum of squares are at most 0.000000 in 25 calls

This simple change already aligns the optimization for the base power form and yields a sensible solution.

Go further: supply a Jacobian

For this model, the first-order partial derivatives are available in closed form. Passing a Jacobian lets the solver use better local information, typically improving speed and numerical stability. This version provides the derivatives and reports fewer function evaluations.

import io

import matplotlib.pyplot as plt
import numpy as np
from scipy.optimize import curve_fit


def base_curve(x: np.ndarray, aa: float, bb: float, cc: float) -> np.ndarray:
    return aa + bb * x ** cc


def grad_fn(x: np.ndarray, aa: float, bb: float, cc: float) -> np.ndarray:
    x_pow = x ** cc
    j = np.empty(shape=(3, x.size), dtype=x.dtype)
    j[0] = 1
    j[1] = x_pow
    j[2] = bb * x_pow * np.log(x)
    return j


def run_fit_with_jac(x: np.ndarray, y: np.ndarray, guess: np.typing.ArrayLike) -> np.ndarray:
    popt, pcov, info, msg, status = curve_fit(
        f=base_curve, jac=grad_fn, xdata=x, ydata=y, p0=guess,
        full_output=True, col_deriv=True,
    )
    print(f'status={status}, {info["nfev"]} model calls, {info["njev"]} Jacobian calls')
    print(msg)
    return popt


def draw_fig(
    x: np.ndarray, y: np.ndarray,
    theta_guess: np.typing.ArrayLike,
    theta_fit: np.typing.ArrayLike,
) -> plt.Figure:
    xgrid = np.logspace(start=1, stop=3, num=100)
    fig, ax = plt.subplots()
    ax.set_xscale('log')
    ax.scatter(x, y, alpha=0.5, label='Data')
    ax.plot(xgrid, base_curve(xgrid, *theta_guess), label='Est')
    ax.plot(xgrid, base_curve(xgrid, *theta_fit), label='Fit')
    ax.legend()
    return fig


def run_demo_jac() -> None:
    raw = io.StringIO(
'''410.0,73.06578085929756
25.0,205.29417389522575
72.0,110.48653325137172
51.0,168.52111516008628
15.0,119.75684720989004
72.0,164.46280991735537
73.0,145.53126751391522
36.0,161.41429319490925
40.0,219.91735537190084
190.0,89.18717804003897
21.0,203.3350571291969
47.0,170.12964176670877
21.0,197.1020714822368
18.0,96.43526170798899
53.0,117.55060034305319
50.0,189.89358650365415
43.0,179.03132807995385
863.0,69.63888057656149
71.0,131.42730764753813
40.0,205.2892561983471
65.0,131.3857292219426
401.0,133.81511047189076
50.0,115.65603442387814
58.0,151.99074870050802
50.0,165.8640803223824
21.0,210.87942861045792
236.0,124.21734182739671
53.0,180.11451429366744
12.0,320.77043917765184
36.0,244.3526170798898
25.0,202.41568198893515
21.0,184.03895128162597
29.0,165.64724945771087
25.0,218.1818181818182
72.0,161.8457300275482
130.0,107.38232466256511
84.0,177.52397865095088
38.0,57.524112172378224
50.0,168.132777815723
25.0,202.41568198893515
21.0,244.3978260657449
48.0,168.3167133392528
200.0,122.8403797554812
37.0,167.84185838731295
83.0,173.75445583988846
13.0,315.835929660122
11.0,314.47181327653976
32.0,203.68741889215215
200.0,123.96694214876034
39.0,110.59353869271226
39.0,190.81504521023686
40.0,235.53719008264466
37.0,181.71111484409758
25.0,215.55576804863057
40.0,235.53719008264466''')

    with raw:
        x, y = np.loadtxt(raw, delimiter=',').T

    theta0 = 0, 1000, -0.5
    aa, bb, cc = theta = run_fit_with_jac(x, y, theta0)
    print(f'MAE = {np.abs(y - base_curve(x, *theta)).mean():.2f}')
    print(f'a = {aa:+.3f}')
    print(f'b = {bb:+.2f}')
    print(f'c = {cc:+.4f}')
    print(f'd = {aa+bb:+.2f}')
    print(f'e = {bb*cc:+.2f}')
    print(f'f = {cc:+.4f}')

    draw_fig(x, y, theta_guess=theta0, theta_fit=theta)
    plt.show()


if __name__ == '__main__':
    run_demo_jac()

status=1, 7 model calls, 6 Jacobian calls. Both actual and predicted relative reductions in the sum of squares are at most 0.000000. MAE = 29.09.

This demonstrates that with an initial estimate and a Jacobian, the fit is fast and stable for the base power form. By the parameter mapping a = d − e / f, b = e / f, c = f, the Box-Cox-like form will trace the same solution under equivalent starting points.

Why this matters

Without a starting point, the optimizer’s behavior is not a property of the model; it’s a side effect of where the search begins. Changing parameterization or scaling may look like it changes model quality, but often it just changes the optimization path. Providing p0 normalizes the comparison across equivalent forms. Supplying a Jacobian improves convergence and reduces function evaluations.

There is an additional observation that adding one more degree of freedom via a + b * (x − c) ** d reduced MAE from 29.1 to 27.9 on the same data. Whether that model is appropriate depends on the problem, and the comparison is not apples-to-apples with the two equivalent parameterizations above.

Takeaways and guidance

When fitting y = a + b * x ** c or its Box-Cox-style reparameterization, prioritize well-chosen initial estimates and pass a Jacobian when available. With those in place, algebraically equivalent formulations do not materially differ in fit quality, convergence, or sensitivity to scaling. Evaluate more flexible models only if they reflect the underlying process and compare them on equal footing.

The article is based on a question from StackOverflow by 3UqU57GnaX and an answer by Reinderien.