2025, Oct 24 15:00

Unexpected NaN propagation in Polars rolling_sum with mixed NaN/null (v1.31.0): how to reproduce, verify with rolling_map, and the incoming fix

Polars 1.31.0 rolling_sum bug with mixed NaN and null: minimal repro, expected vs actual output, rolling_map baseline, and the fix in the next release.

Rolling window analytics are a staple in data pipelines, but their behavior can become surprising when missing values enter the picture. If you are using Polars 1.31.0 with Python 3.12.11 and NumPy 2.3.1, you may encounter unexpected results from rolling_sum when a column mixes NaN (np.nan) and null (None). This guide shows the minimal reproduction, explains what happens, and points to the fix that will ship in the next release.

Reproducing the expected behavior with only NaN

Start with a column that contains a single NaN and otherwise valid floats. With a rolling window of 2, the first position is null (insufficient data), the two windows that include NaN are NaN, and all other windows sum to 2.0 as expected.

import polars as pl
import numpy as np
src_one = {"x": [1., 1., 1., np.nan, 1., 1., 1., 1., 1.]}
df_one = pl.DataFrame(src_one)
with pl.Config(tbl_rows=20):
    print(
        df_one.with_columns(
            pl.col("x").rolling_sum(2).alias("roll_val")
        )
    )
shape: (9, 2)
┌─────┬──────────┐
│ x   ┆ roll_val │
│ --- ┆ ---      │
│ f64 ┆ f64      │
╞═════╪══════════╡
│ 1.0 ┆ null     │
│ 1.0 ┆ 2.0      │
│ 1.0 ┆ 2.0      │
│ NaN ┆ NaN      │
│ 1.0 ┆ NaN      │
│ 1.0 ┆ 2.0      │
│ 1.0 ┆ 2.0      │
│ 1.0 ┆ 2.0      │
│ 1.0 ┆ 2.0      │
└─────┴──────────┘

When NaN and null are combined

Introduce a null alongside a NaN. Intuitively, you might still expect the same pattern around the NaN, plus null-driven gaps for the windows that include the null. Instead, rolling_sum begins to propagate NaN farther than expected after the first occurrence.

src_two = {"x": [1., 1., 1., np.nan, 1., 1., 1., 1., 1., None, 1., 1., 1.]}
df_two = pl.DataFrame(src_two)
with pl.Config(tbl_rows=20):
    print(
        df_two.with_columns(
            pl.col("x").rolling_sum(2).alias("roll_val")
        )
    )
shape: (13, 2)
┌──────┬──────────┐
│ x    ┆ roll_val │
│ ---  ┆ ---      │
│ f64  ┆ f64      │
╞══════╪══════════╡
│ 1.0  ┆ null     │
│ 1.0  ┆ 2.0      │
│ 1.0  ┆ 2.0      │
│ NaN  ┆ NaN      │
│ 1.0  ┆ NaN      │
│ 1.0  ┆ 2.0      │
│ 1.0  ┆ NaN      │
│ 1.0  ┆ NaN      │
│ 1.0  ┆ NaN      │
│ null ┆ null     │
│ 1.0  ┆ null     │
│ 1.0  ┆ NaN      │
│ 1.0  ┆ NaN      │
└──────┴──────────┘

Only one normal 2.0 appears after the NaN; subsequent windows unexpectedly return NaN even though they contain finite values. This deviates from the usual semantics you see with NaN-only inputs.

What’s going on

The behavior above is a bug in rolling_sum when NaN and null coexist in the same column. It has been fixed on main and will be included in the next Polars release. The change is tracked here: https://github.com/pola-rs/polars/pull/23482.

A correctness baseline using rolling_map

If you compute the same rolling sum with rolling_map(sum, 2), the result matches the intuitive expectation for both NaN and null windows. This makes it a useful correctness baseline for validation.

with pl.Config(tbl_rows=20):
    print(
        df_two.with_columns(
            pl.col("x").rolling_map(sum, 2).alias("roll_val")
        )
    )
shape: (13, 2)
┌──────┬──────────┐
│ x    ┆ roll_val │
│ ---  ┆ ---      │
│ f64  ┆ f64      │
╞══════╪══════════╡
│ 1.0  ┆ null     │
│ 1.0  ┆ 2.0      │
│ 1.0  ┆ 2.0      │
│ NaN  ┆ NaN      │
│ 1.0  ┆ NaN      │
│ 1.0  ┆ 2.0      │
│ 1.0  ┆ 2.0      │
│ 1.0  ┆ 2.0      │
│ 1.0  ┆ 2.0      │
│ null ┆ null     │
│ 1.0  ┆ null     │
│ 1.0  ┆ 2.0      │
│ 1.0  ┆ 2.0      │
└──────┴──────────┘

However, rolling_map executes a Python UDF and materializes Series objects, which adds significant overhead. The documentation warns against using it for production workloads unless absolutely necessary.

Computing custom functions is extremely slow. Use specialized rolling functions such as Expr.rolling_sum() if at all possible.

The fix and the corrected output

With the fix merged on main (see the PR above), rolling_sum once again produces the expected results for mixed NaN and null. Running the same expression yields the correct sums and localized NaN/null effects.

with pl.Config(tbl_rows=20):
    print(
        df_two.with_columns(
            pl.col("x").rolling_sum(2).alias("roll_val")
        )
    )
┌──────┬──────────┐
│ x    ┆ roll_val │
│ ---  ┆ ---      │
│ f64  ┆ f64      │
╞══════╪══════════╡
│ 1.0  ┆ null     │
│ 1.0  ┆ 2.0      │
│ 1.0  ┆ 2.0      │
│ NaN  ┆ NaN      │
│ 1.0  ┆ NaN      │
│ 1.0  ┆ 2.0      │
│ 1.0  ┆ 2.0      │
│ 1.0  ┆ 2.0      │
│ 1.0  ┆ 2.0      │
│ null ┆ null     │
│ 1.0  ┆ null     │
│ 1.0  ┆ 2.0      │
│ 1.0  ┆ 2.0      │
└──────┴──────────┘

Why this matters

Rolling aggregations underpin anomaly detection, KPIs, and time-series features. Silent divergence when NaN and null appear together can skew metrics or invalidate model inputs. Knowing about this edge case and the upcoming fix helps you choose the right tool—either validate with rolling_map for correctness checks or rely on the optimized rolling_sum once the fix is available.

Practical takeaways

If you are on Polars 1.31.0 and mix NaN with null in rolling windows, be aware of the unexpected NaN propagation shown above. For correctness validation, rolling_map(sum, 2) reproduces the intended behavior but incurs heavy overhead due to Python UDF execution and Series materialization, as the docs caution. The underlying issue has been addressed and will be part of the next release, tracked at https://github.com/pola-rs/polars/pull/23482. Align your choice between performance and immediate correctness depending on whether you can adopt the fixed release.

The article is based on a question from StackOverflow by Arran and an answer by jqurious.