https://pytroubles.com/en/posts/id999-fixing-polars-cum-sum-horizontal-unnest-phantom-literal-column-bug-and-1-32-0-changes

Fixing Polars cum_sum_horizontal + unnest: phantom 'literal' column bug and 1.32.0 changes

Polars cum_sum_horizontal and unnest: resolving the phantom literal column and 1.32.0 API update

Fixing Polars cum_sum_horizontal + unnest: phantom 'literal' column bug and 1.32.0 changes

Learn how a Polars 1.31.0 bug created a phantom literal column after cum_sum_horizontal + unnest, and how 1.32.0 fixes it. Use unpacked args or cum_fold.

2025-10-18T13:00:05+03:00

Horizontal cumulative sums in Polars are handy until they quietly inject a phantom column into your schema. If you chain cum_sum_horizontal with unnest, you may run into a lingering literal column that refuses to go away, tripping ColumnNotFoundError even after an explicit drop. This guide walks through the reproducible case, what changed in Polars 1.32.0, and how to adjust your code.Reproducing the issueThe following snippet demonstrates the problem seen in Polars 1.31.0: a literal column appears in the schema after a horizontal cumulative sum and unnest, and it persists even when dropped.import polars as pl def run_schema_anomaly(): print("Polars version:", pl.__version__) table = pl.DataFrame({ "A": [1, 2, 3], "T0": [0.1, 0.2, 0.3], "T1": [0.4, 0.5, 0.6], "T2": [0.7, 0.8, 0.9], }) steps = ["T0", "T1", "T2"] print("Original columns:", table.columns) print("Time columns:", steps) lf = table.lazy() print("Schema before cumsum:", lf.collect_schema().names()) stage = ( lf.select(pl.cum_sum_horizontal(steps)) .unnest("cum_sum") .rename({name: f"C{name}" for name in steps}) ) print("Schema after cumsum:", stage.collect_schema().names()) try: _ = stage.collect() print("v1: No bug reproduced") except pl.exceptions.ColumnNotFoundError as err: print(f"v1: BUG REPRODUCED: {err}") stage2 = stage.drop("literal") stage2 = pl.concat([pl.LazyFrame({"B": [1, 2, 3]})], how="horizontal").hstack(stage2) print("Schema after drop and concat:", stage2.collect_schema().names()) try: _ = stage2.collect() print("v2: No bug reproduced") except pl.exceptions.ColumnNotFoundError as err: print(f"v2: BUG REPRODUCED: {err}") if __name__ == "__main__": run_schema_anomaly() The outcome shows a schema that contains the unexpected literal entry after the cumulative operation and unnesting step. Even after dropping it and concatenating another frame horizontally, collecting still fails with ColumnNotFoundError.What’s actually happeningThis is a bug. In Polars 1.31.0 the combination of cum_sum_horizontal and unnest could yield a phantom literal column that remained in the inferred schema and caused downstream failures when the plan was executed. The schema looked right at a glance but wasn’t consistent with what collect could resolve, hence the ColumnNotFoundError.Fix in Polars 1.32.0 and a behavior changePolars 1.32.0 includes a fix for the literal column issue. After upgrading, the ghost column problem is resolved. There is, however, a related change in how arguments are accepted by cum_sum_horizontal. Passing a list of names directly now results in InvalidOperationError; you need to unpack the list.# This now errors on 1.32.0 lf.select(pl.cum_sum_horizontal(steps)).collect() # InvalidOperationError: cannot add columns: dtype was not list on all nesting levels: # (left: list[str], right: f64) Unpacking the columns works as expected:lf.select(pl.cum_sum_horizontal(*steps)).collect() To return per-column cumulative values, unnest and rename as before:fixed = ( lf.select(pl.cum_sum_horizontal(*steps)) .unnest("cum_sum") .rename({name: f"C{name}" for name in steps}) ) # fixed.collect() # succeeds on 1.32.0 The source shows cum_sum_horizontal as a wrapper around cum_fold. On 1.32.0, cum_fold still accepts a list. If you prefer to keep a list-based API, you can use cum_fold directly and then unnest the result.( lf .select(pl.cum_fold(0, lambda x, y: x + y, steps)) .unnest(pl.all()) .collect() ) Why this mattersSubtle schema mismatches are expensive to debug in lazy pipelines. When a column appears in the logical plan but can’t be materialized at execution, errors only show up at collect time and are often detached from the original transformation. Knowing that the literal column issue was a bug in 1.31.0 and that 1.32.0 changes how cum_sum_horizontal receives its inputs helps prevent wasted time on phantom columns and argument shape surprises.Practical takeawaysUpgrade to Polars 1.32.0 or newer to avoid the literal column schema artifact after horizontal cumulative sums. If you rely on cum_sum_horizontal, pass columns as unpacked arguments rather than a list. If your codebase prefers list-based expressions, cum_fold with a list still works in 1.32.0 and can be followed by unnest to produce the expanded columns.ConclusionThe literal column behavior with cum_sum_horizontal and unnest in 1.31.0 was a genuine bug that caused confusing schema and execution errors. The 1.32.0 release fixes it, and it also nudges usage toward unpacked arguments for cum_sum_horizontal. If you hit the new InvalidOperationError, switch to pl.cum_sum_horizontal(*cols) or use pl.cum_fold with a list and unnest. Keeping these details in mind makes horizontal cumulative aggregations predictable and keeps your lazy plans robust.

Polars, cum_sum_horizontal, unnest, literal column, phantom column, ColumnNotFoundError, schema bug, Polars 1.31.0, Polars 1.32.0, InvalidOperationError, cum_fold, horizontal cumsum, fix

2025

2025, Oct 18 13:00

Polars cum_sum_horizontal and unnest: resolving the phantom literal column and 1.32.0 API update

Learn how a Polars 1.31.0 bug created a phantom literal column after cum_sum_horizontal + unnest, and how 1.32.0 fixes it. Use unpacked args or cum_fold.

Reproducing the issue

The following snippet demonstrates the problem seen in Polars 1.31.0: a literal column appears in the schema after a horizontal cumulative sum and unnest, and it persists even when dropped.

import polars as pl
def run_schema_anomaly():
    print("Polars version:", pl.__version__)
    table = pl.DataFrame({
        "A": [1, 2, 3],
        "T0": [0.1, 0.2, 0.3],
        "T1": [0.4, 0.5, 0.6],
        "T2": [0.7, 0.8, 0.9],
    })
    steps = ["T0", "T1", "T2"]
    print("Original columns:", table.columns)
    print("Time columns:", steps)
    lf = table.lazy()
    print("Schema before cumsum:", lf.collect_schema().names())
    stage = (
        lf.select(pl.cum_sum_horizontal(steps))
          .unnest("cum_sum")
          .rename({name: f"C{name}" for name in steps})
    )
    print("Schema after cumsum:", stage.collect_schema().names())
    try:
        _ = stage.collect()
        print("v1: No bug reproduced")
    except pl.exceptions.ColumnNotFoundError as err:
        print(f"v1: BUG REPRODUCED: {err}")
    stage2 = stage.drop("literal")
    stage2 = pl.concat([pl.LazyFrame({"B": [1, 2, 3]})], how="horizontal").hstack(stage2)
    print("Schema after drop and concat:", stage2.collect_schema().names())
    try:
        _ = stage2.collect()
        print("v2: No bug reproduced")
    except pl.exceptions.ColumnNotFoundError as err:
        print(f"v2: BUG REPRODUCED: {err}")
if __name__ == "__main__":
    run_schema_anomaly()

The outcome shows a schema that contains the unexpected literal entry after the cumulative operation and unnesting step. Even after dropping it and concatenating another frame horizontally, collecting still fails with ColumnNotFoundError.

What’s actually happening

This is a bug. In Polars 1.31.0 the combination of cum_sum_horizontal and unnest could yield a phantom literal column that remained in the inferred schema and caused downstream failures when the plan was executed. The schema looked right at a glance but wasn’t consistent with what collect could resolve, hence the ColumnNotFoundError.

Fix in Polars 1.32.0 and a behavior change

Polars 1.32.0 includes a fix for the literal column issue. After upgrading, the ghost column problem is resolved. There is, however, a related change in how arguments are accepted by cum_sum_horizontal. Passing a list of names directly now results in InvalidOperationError; you need to unpack the list.

# This now errors on 1.32.0
lf.select(pl.cum_sum_horizontal(steps)).collect()
# InvalidOperationError: cannot add columns: dtype was not list on all nesting levels: 
# (left: list[str], right: f64)

Unpacking the columns works as expected:

lf.select(pl.cum_sum_horizontal(*steps)).collect()

To return per-column cumulative values, unnest and rename as before:

fixed = (
    lf.select(pl.cum_sum_horizontal(*steps))
      .unnest("cum_sum")
      .rename({name: f"C{name}" for name in steps})
)
# fixed.collect()  # succeeds on 1.32.0

The source shows cum_sum_horizontal as a wrapper around cum_fold. On 1.32.0, cum_fold still accepts a list. If you prefer to keep a list-based API, you can use cum_fold directly and then unnest the result.

(
    lf
      .select(pl.cum_fold(0, lambda x, y: x + y, steps))
      .unnest(pl.all())
      .collect()
)

Why this matters

Subtle schema mismatches are expensive to debug in lazy pipelines. When a column appears in the logical plan but can’t be materialized at execution, errors only show up at collect time and are often detached from the original transformation. Knowing that the literal column issue was a bug in 1.31.0 and that 1.32.0 changes how cum_sum_horizontal receives its inputs helps prevent wasted time on phantom columns and argument shape surprises.

Practical takeaways

Upgrade to Polars 1.32.0 or newer to avoid the literal column schema artifact after horizontal cumulative sums. If you rely on cum_sum_horizontal, pass columns as unpacked arguments rather than a list. If your codebase prefers list-based expressions, cum_fold with a list still works in 1.32.0 and can be followed by unnest to produce the expanded columns.

Conclusion

The literal column behavior with cum_sum_horizontal and unnest in 1.31.0 was a genuine bug that caused confusing schema and execution errors. The 1.32.0 release fixes it, and it also nudges usage toward unpacked arguments for cum_sum_horizontal. If you hit the new InvalidOperationError, switch to pl.cum_sum_horizontal(*cols) or use pl.cum_fold with a list and unnest. Keeping these details in mind makes horizontal cumulative aggregations predictable and keeps your lazy plans robust.

The article is based on a question from StackOverflow by Nicolò Cavalleri and an answer by jqurious.

cumulative-sum polars python python-polars