2025, Nov 05 19:00

pandas rolling std on tail slices: anchored windows, online variance updates, and floating-point pitfalls

Learn why pandas rolling std can differ across tail slices: anchored windows and online variance updates amplify floating-point errors. Get fixes and options.

Why pandas rolling std can change when you slice from the tail

Rolling statistics feel deterministic: same window size, same values, same result. Yet in pandas, applying rolling standard deviation to different tail slices of the same Series can produce different outputs for overlapping windows. This guide explains why that happens, what exactly is being computed under the hood, and how to reason about the results without tripping over numerical pitfalls.

Reproducing the behavior

Consider a short Series where one value dwarfs the others. Compute the rolling standard deviation with a window size of three over different tail slices.

import numpy as np
import pandas as pd
series_x = pd.Series(np.random.default_rng(seed=123).random(size=5))
series_x[1] = 10000000  # a very large value
series_x
# 0    6.823519e-01
# 1    1.000000e+07
# 2    2.203599e-01
# 3    1.843718e-01
# 4    1.759059e-01
# dtype: float64
series_x.tail(3).rolling(window=3, min_periods=1).std()
# 2         NaN
# 3    0.025447
# 4    0.023604
# dtype: float64
series_x.tail(4).rolling(window=3, min_periods=1).std()
# 1             NaN
# 2    7.071068e+06
# 3    5.773503e+06
# 4    0.000000e+00
# dtype: float64
series_x.tail(5).rolling(window=3, min_periods=1).std()
# 0             NaN
# 1    7.071067e+06
# 2    5.773502e+06
# 3    5.773503e+06
# 4    0.000000e+00
# dtype: float64

Now call the same rolling windows, but evaluate each window independently using apply with Series.std. The last result becomes consistent across slices:

series_x.tail(3).rolling(window=3, min_periods=1).apply(pd.Series.std)
# 2         NaN
# 3    0.025447
# 4    0.023604
# dtype: float64
series_x.tail(4).rolling(window=3, min_periods=1).apply(pd.Series.std)
# 1             NaN
# 2    7.071068e+06
# 3    5.773503e+06
# 4    2.360426e-02
# dtype: float64
series_x.tail(5).rolling(window=3, min_periods=1).apply(pd.Series.std)
# 0             NaN
# 1    7.071067e+06
# 2    5.773502e+06
# 3    5.773503e+06
# 4    2.360426e-02
# dtype: float64

What is going on

There are two independent effects at play. The first is purely about how rolling windows are defined. The second is numerical and shows up when your data mixes huge and tiny magnitudes.

First, rolling windows are anchored to the end of the Series. With five elements a, b, c, d, e and a window of three, the engine computes the following windows in sequence:

std(a)          # 0         NaN
std(a, b)       # 1    0.444438
std(a, b, c)    # 2    0.325633
std(b, c, d)    # 3    0.087630
std(c, d, e)    # 4    0.023604

If you slice to tail(3), you change the initial windows, so the first n−1 outputs for a window of size n are inevitably different. For tail(3) the sequence becomes:

std(c)          # 2         NaN
std(c, d)       # 3    0.025447
std(c, d, e)    # 4    0.023604

Here, std(c, d) is not equal to std(b, c, d), so early values differ as expected due to the anchoring.

The second effect explains why, in the example with an extremely large number alongside small ones, the last value can turn into zero with rolling.std but remain a small positive number with rolling.apply(pd.Series.std). According to the pandas implementation, the rolling variance (std squared) is computed online and maintains, among other intermediates, the sum of squared deviations from the mean, noted as ssqdm_x. When the rolling window moves past the large value, the update step remove_var subtracts a very large term (val - prev_mean) * (val - mean_x[0]) from an already imprecise large ssqdm_x. Because these intermediates are float64 and the magnitudes differ by many orders, the subtraction does not recover the small true variance; the remaining ssqdm_x can be so large that later small contributions become negligible, and the resulting variance for the final windows collapses to nearly zero. This online update uses shared intermediates across windows, so the numerical issue propagates. In contrast, rolling.apply(pd.Series.std) evaluates each triplet independently, avoiding that propagation.

A small change that reveals the numeric edge

Reduce the large value by one zero and the behavior improves notably, because the magnitude gap shrinks and float64 can represent the intermediates more faithfully in the same array.

series_y = pd.Series(np.random.default_rng(seed=123).random(size=5))
series_y[1] = 1000000
series_y.rolling(window=3, min_periods=1).std()
# 0              NaN
# 1    707106.298691
# 2    577350.008599
# 3    577350.152354
# 4         0.021852
# dtype: float64

The difference becomes clearer when looking at the variance itself:

series_y.rolling(window=3, min_periods=1).var()
# 0             NaN
# 1    4.999993e+11
# 2    3.333330e+11
# 3    3.333332e+11
# 4    4.775168e-04  # a gap of about 15 orders of magnitude
# dtype: float64

These numbers illustrate how a single very large value can dominate the accumulation of squared deviations, and why online updates may fail to reconstruct a tiny tail variance once the large value leaves the window.

So what is the fix?

The computation itself is working as designed: rolling windows are anchored, and the variance is maintained via an online algorithm. If you need consistent last-window results when mixing extreme magnitudes, compute each window independently with rolling.apply(pd.Series.std). If you reduce the extreme value (for example from 10000000 to 1000000 in this setup), the final rolling.std result no longer collapses, because float64 intermediates retain enough precision across the shared state.

Why this matters

Time-series analytics often run into mixed scales: counters with spikes or metrics combining outliers and small fluctuations. Understanding that rolling.std is an online, shared-state computation explains why an outlier that has left the window can still indirectly affect later results through floating-point limitations. It also clarifies why slicing the tail changes the first n−1 outputs for a window of size n, since the anchors shift the early windows.

Takeaways

Expect the first n−1 positions of a rolling window of size n to depend on the exact slice you start from, because windows are anchored. When your data combines huge and tiny values, shared intermediates in the online algorithm can lose precision and produce near-zero variances even after the outlier has left the window. To obtain consistent per-window results in such cases, compute each window independently via rolling.apply(pd.Series.std). And if feasible, reducing the magnitude gap helps avoid precision loss in the first place.

Docs: pandas Rolling.std — https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.core.window.rolling.Rolling.std.html

The article is based on a question from StackOverflow by KamiKimi 3 and an answer by mozway.