2025, Sep 30 07:00

Logistic curve fitting on normalized China GDP: why high R² can hide large errors and how to improve forecasts

Learn why logistic fits with SciPy curve_fit show high R² yet miss 1970 China GDP, how scaling skews errors, and practical ways to improve early-year forecasts.

Fitting a logistic curve to macroeconomic time series often looks convincing at first glance: the function is smooth, the R² is high, and the overlayed curve tracks the data. Yet a single-year prediction can still land far off in absolute terms once you unscale it back to the original units. Below is a practical walkthrough of why that happens, what exactly is going on in code, and a couple of safe ways to reason about it without inventing extra assumptions.

Problem setup

The task is to fit a logistic-shaped function to China’s GDP series using scipy.optimize.curve_fit, with features and targets normalized to [0, 1]. The curve visually matches the data and the reported R² is very high. However, the back-transformed prediction for 1970 comes out around 1.92285528141e11, while the actual GDP for that year is about 9.15062113063745e10.

Reproducible code example

The following script normalizes inputs and outputs, fits a sigmoid, reports R² on a holdout slice, and computes the 1970 prediction after reversing the scaling. Names are chosen for clarity, but the logic mirrors the scenario described.

import numpy as np
import pandas as pd
from scipy.optimize import curve_fit
from scipy.special import expit
frame = pd.read_csv("china_gdp.csv")
split_mask = np.random.rand(len(frame)) < 0.8
t_raw = frame['Year'].values.astype(float)
g_raw = frame['Value'].values.astype(float)
# scale to [0, 1]
t_unit = (t_raw - t_raw.min()) / (t_raw.max() - t_raw.min())
g_unit = (g_raw - g_raw.min()) / (g_raw.max() - g_raw.min())
def sig_curve(x, a1, a2, a3, a4):
    return a3 + a4 * expit(a1 * (x - a2))
init_guess = (5.0, 0.5, 0.0, 1.0)
param_bounds = ([0.0, 0.0, -np.inf, 0.0], [np.inf, 1.0, np.inf, np.inf])
opt_params, covar = curve_fit(sig_curve, t_unit, g_unit, p0=init_guess, bounds=param_bounds)
print(dict(zip(['a1','a2','a3','a4'], opt_params)))
import matplotlib.pyplot as plt
t_grid = np.linspace(0, 1, 300)
g_grid = sig_curve(t_grid, *opt_params)
plt.scatter(t_unit, g_unit, s=15, label='data')
plt.plot(t_grid, g_grid, label='fit')
plt.legend(); plt.show()
from sklearn.metrics import r2_score
# note: the fit above used all points; this slice is only illustrative
thold = t_unit[~split_mask]
ghold = g_unit[~split_mask]
gpred = sig_curve(thold, *opt_params)
print(r2_score(ghold, gpred))
yr_query = 1970
scaled_pred = sig_curve((yr_query - t_raw.min()) / (t_raw.max() - t_raw.min()), *opt_params)
back_to_units = scaled_pred * (g_raw.max() - g_raw.min()) + g_raw.min()
print(back_to_units)

Representative outputs reported for this setup are an R² around 0.9985891574981185 on the scaled holdout and a 1970 prediction around 192285528141.3661, whereas the actual value is 91506211306.3745.

Why a great-looking fit still misses a specific year

The implementation is fine. The discrepancy is a consequence of the interaction between the chosen function class and the data. The model being fitted is a constant-plus-sigmoid. A sigmoid is monotonic, has gentle slopes at the ends, and a steep middle section. The GDP series in question features a pronounced takeoff phase in the early-to-mid 1990s. When both the input year and the target GDP are normalized into [0, 1], the loss is optimized on the scaled space, not on the original units. That makes relative errors in the low part of the curve look small in normalized terms, even if they translate into large absolute deviations once you unscale back to dollars. With a single sigmoid, there simply are not enough degrees of freedom to faithfully capture every bump and shift in a curve that includes a dramatic acceleration period.

This also explains why the overall error can look modest on the [0, 1] target scale. For example, the sum of absolute errors on the normalized outputs can be around 0.26858. Squared-error metrics in [0, 1] compress large original-unit differences, so they can underplay absolute deviations at the low end after rescaling.

There is another subtlety. The fitting call above uses all points for parameter estimation, and afterwards the score is computed on a subset drawn from the same time series. Even if you were to refit only on the masked subset, a random split on a temporal sequence is not an informative validation protocol for a forecasting use case. In any case, the central issue here is that the function class is too rigid to simultaneously hug both the low-GDP region and the explosive-growth region after normalization.

A pragmatic way to tighten early-year predictions

Restricting the fit to a regime that excludes the steep takeoff aligns the function’s shape with the data’s local behavior. Truncating the series to 1960–1991 improves the 1970 estimate substantially in this case. The following snippet shows how to fit on that earlier window.

_take = 32
split_mask = np.random.rand(_take) < 0.8
t_slice = frame['Year'].values.astype(float)[:_take]
g_slice = frame['Value'].values.astype(float)[:_take]
# rescale within the slice
ts = (t_slice - t_slice.min()) / (t_slice.max() - t_slice.min())
gs = (g_slice - g_slice.min()) / (g_slice.max() - g_slice.min())
opt_params2, covar2 = curve_fit(sig_curve, ts, gs, p0=init_guess, bounds=param_bounds)
yr_query = 1970
scaled_q = (yr_query - t_slice.min()) / (t_slice.max() - t_slice.min())
scaled_ans = sig_curve(scaled_q, *opt_params2)
back_scaled = scaled_ans * (g_slice.max() - g_slice.min()) + g_slice.min()
print(back_scaled)

The 1970 estimate from this truncated-fit approach comes out near 94736151945.78181, much closer to 91506211306.3745. The visual fit over the earlier segment also aligns the low-value area more tightly with the curve.

Exploring model flexibility without jumping straight into overfitting

If the goal is to systematically see how added flexibility affects fit quality, you can experiment with a polynomial baseline and vary the degree, then watch how the predictions move from underfit to overfit as you add parameters. The snippet below performs a least-squares fit for several polynomial degrees and plots predictions against a fixed “real” vector. This is not a prescription for GDP modeling; it is a compact way to build intuition around degrees of freedom and their impact on fit behavior.

import numpy as np
import matplotlib.pyplot as plt
step_deg = 3
rng_seed = 1
orders = step_deg * (1 + np.arange(6))
count = 50
x_ax = np.arange(count)
np.random.seed(int(rng_seed)); y_true = 1e2 * np.random.rand(count)
y_curves = [f(x_ax) for f in [np.poly1d(np.polyfit(x_ax, y_true, k)) for k in orders]]
fig, axs = plt.subplots(3, 2)
slots = sorted((i % 3, i % 2) for i in range(6))
for i in range(len(orders)):
    axs[*slots[i]].set_title(f'Degree = {orders[i]}')
    axs[*slots[i]].scatter(x_ax, y_true, s=15, label='data')
    axs[*slots[i]].plot(x_ax, y_curves[i], label='fit')
    axs[*slots[i]].legend()
plt.show()

This exercise makes it clear how additional parameters tighten the fit and, beyond a point, start following every wiggle. It also reinforces why a single sigmoid may be too restrictive for data that combine calm periods with explosive growth.

Why this matters

Choosing a function that matches the phenomenon’s shape is as important as the optimizer and the metric. Normalizing targets into [0, 1] is common and useful, but it relocates the optimization to a space where relative differences dominate. After you invert the scaling, apparently small normalized discrepancies can unfold into large absolute errors at the low end. If you then interpret those in original units without this context, conclusions can be misleading. The validation protocol matters too. Random splits in temporal data are not representative of out-of-sample forecasting, and fitting on all points while reporting a split score can overstate performance.

Takeaways

When a model yields an excellent global fit yet misses specific years in absolute terms, first check alignment between the function class and the data’s regimes. For GDP with a sharp takeoff, a single logistic curve balances errors across the scaled range and will not nail both the early low values and the later high values simultaneously. Fitting within a coherent regime, as in the 1960–1991 slice, can materially improve early-year predictions. To understand how capacity affects fit quality, vary degrees of freedom in a simple baseline and watch where the boundary between healthy flexibility and overfitting lies. Most importantly, interpret errors in the scale you care about, and be cautious with random splits on time series when judging predictive credibility.

The article is based on a question from StackOverflow by MSo and an answer by welp.