2025, Dec 01 09:00

Polars: Create a new column by positional lookup across columns (1-based to 0-based) with with_columns

Learn how to create a Polars DataFrame column by positional lookup: convert 1-based indexes to 0-based, use Series indexing with with_columns, avoid loops.

When you need to look up values in one column using positional indexes stored in another column, it can be tempting to reach for a loop. In Polars, there is a concise way to do this in one shot by leveraging direct Series indexing and adding the result as a new column.

Problem overview

The task is to create a new column whose values come from ref, but the positions are taken from idx. The idx values are 1-based positions, so the lookup needs to account for 0-based indexing in Polars.

import polars as pl

tbl = pl.DataFrame({
    'ref': ['a', 'b', 'c', 'd', 'e', 'f'],
    'idx': [4, 3, 1, 6, 2, 5],
})

What is happening under the hood

Polars Series are 0-based, while the indexes stored in idx represent 1-based positions. If you use idx as-is, you will be off by one for every row. The correct approach is to subtract one from idx before using it to index into the ref Series.

There is one more important detail. Direct Series operations like tbl['col'][...] are an eager concept. They are not available in Lazy mode.

Solution

You can add the derived column in a single call using DataFrame.with_columns(). The only adjustment is idx - 1 to align with 0-based indexing.

import polars as pl

tbl = pl.DataFrame({
    'ref': ['a', 'b', 'c', 'd', 'e', 'f'],
    'idx': [4, 3, 1, 6, 2, 5],
})

enriched = tbl.with_columns(**{'ref[idx]': tbl['ref'][tbl['idx'] - 1]})

This creates the new ref[idx] column with values ['d', 'c', 'a', 'f', 'b', 'e'] as intended.

Why this matters

Understanding Polars indexing rules helps avoid off-by-one errors that are easy to miss in data pipelines. It also clarifies the distinction between eager and lazy contexts: Series indexing like tbl['ref'][...] works eagerly, but the same pattern does not apply in Lazy mode. Keeping this mental model saves time when switching between quick interactive work and optimized pipelines.

Notes on display

If you see a nicely formatted ASCII table when printing a DataFrame in an interactive session, that rendering is the default pretty print in IPython. No extra formatting code is needed to get that view.

Wrap-up

For dynamic positional lookups across columns in Polars, use DataFrame.with_columns() and align positions with 0-based indexing by subtracting one when your positional data is 1-based. Remember that direct Series operations like tbl['col'][...] target the eager API and are not available in Lazy mode. With these details in mind, you can express this pattern cleanly and reliably.