2025, Dec 22 19:00

Polars Column Access Explained: df['col'] vs get_column, pl.col usage, and Lazy-friendly workflows

Learn how to access columns in Polars safely: df['col'] vs get_column, pl.col attribute vs function, and why Lazy API pipelines outperform Series-first code.

Polars users often stumble upon advice that direct indexing like df["a"] is discouraged in favor of df.get_column("a") and that attribute-style expressions like pl.col.a might be a trap. To make matters worse, some of the reference pages that discussed this have moved or gone stale. Here is a concise, practical guide to what’s safe, what’s equivalent, and what you should do instead to keep your code robust and idiomatic.

Code example: the tempting approach

It’s common to start by pulling out a column as a Series and operating on it directly. You might also mix attribute-style and function-style expression building.

import polars as pl

frame = pl.DataFrame({"amount": [1, 2, 3], "note": ["a", "b", "c"]})

# Column extraction by indexing (equivalent to get_column)
col_series = frame["amount"]
# Same result:
col_series_same = frame.get_column("amount")

# Series-first workflow
scaled = col_series * 2
head_two = scaled.head(2)

# Expression building: attribute vs function
expr_attr = pl.col.amount          # same as below when the name is a valid Python attribute
expr_call = pl.col("amount")      # equivalent form

# Column names that are not valid Python attributes must use pl.col("...")
with_space = pl.DataFrame({"total cost": [10, 20]})
expr_space = pl.col("total cost")  # required when name contains spaces

What’s actually going on

Two pairs of constructs here are equivalent in Polars. First, df["column"] and df.get_column("column") are directly equivalent; indexing calls get_column under the hood. Choosing one or the other is a matter of personal preference.

Second, pl.col.name works the same as pl.col("name"). The only hard rule is that attribute access is only possible when the column name is a valid Python attribute; if the name contains spaces or otherwise isn’t a valid identifier, you must use pl.col("column name"). Some developers prefer always using the function form for consistency.

If you encounter older documentation about indexing patterns through an archive such as the Wayback Machine, keep in mind those pages can be outdated and not fully accurate anymore.

A better approach with expressions (and Lazy)

While df["column"] and df.get_column("column") are interchangeable, the preferred style in Polars is to avoid pulling Series out in the first place and instead compose expressions that you select or filter. This is especially important because to get the most out of Polars you should use the Lazy API whenever you can, and in Lazy mode you cannot use lf["col"] or lf.get_column().

import polars as pl

source = pl.DataFrame({"amount": [1, 2, 3], "note": ["a", "b", "c"]})

lazy_view = source.lazy()

# Compose expressions, then select/filter, then collect the result
result = (
    lazy_view
    .select((pl.col("amount") * 2).alias("amount_x2"))
    .filter(pl.col("amount") > 1)
    .collect()
)

# Attribute form is equivalent when the name is a valid identifier
same_expr = lazy_view.select(pl.col.amount).collect()

This style scales naturally, keeps transformations declarative, and allows you to chain expressions cleanly.

Why this matters

The key idea is that you should be able to switch from the eager API to the lazy API by adding a single .lazy() to your code. If you rely on Series-level operations, that switch forces a rewrite, which is another reason those patterns are discouraged. For small datasets, you might still choose to operate on individual columns as Series directly, but the general guidance remains: prefer expression-based pipelines that work seamlessly in Lazy mode.

Takeaways

If you access a column eagerly, df["col"] and df.get_column("col") are the same; pick one style and stick with it. For expressions, pl.col.a and pl.col("a") are interchangeable when a is a valid attribute name, but pl.col("...") is required for names that contain spaces or otherwise don’t qualify. To unlock Polars’ strengths, avoid extracting Series for transformations. Compose expressions, then select() or filter(), and you’ll be able to move between eager and lazy execution without rewrites.