2025, Dec 31 01:00

Filter rows without nulls across all columns in Polars using all_horizontal and a single predicate

Learn how to filter a Polars DataFrame to keep rows without nulls across all columns using all_horizontal and is_not_null, avoiding ambiguity errors in filter.

Filtering out rows that contain any nulls sounds trivial until you try to do it across every column in a Polars DataFrame. A straightforward attempt with an expression like all() looks natural, but it expands into multiple predicates and triggers an ambiguity error. Here’s how to approach it correctly and scalably.

Reproducing the issue

Consider a simple DataFrame with a few nulls sprinkled across columns:

import polars as ps
tbl = ps.DataFrame({
    "id": [1, 2, 3, 4, 5],
    "variable1": [15, None, 5, 10, 20],
    "variable2": [40, 30, 50, 10, None],
})

Trying to filter all columns at once with a single call like this looks tempting:

# Attempt 1
(
    tbl
    .filter(
        ps.all().is_not_null()
    )
)
# Attempt 2
(
    tbl
    .filter(
        ps.any_horizontal().is_not_null()
    )
)

But Polars raises an error because the predicate becomes more than one expression:

ComputeError: The predicate passed to 'LazyFrame.filter' expanded to multiple expressions:

col("id").is_not_null(),
col("variable1").is_not_null(),
col("variable2").is_not_null(),
This is ambiguous. Try to combine the predicates with the 'all' or `any' expression.

Manually writing out each column works but doesn’t scale:

(
    tbl
    .filter(
        ps.col("variable1").is_not_null(),
        ps.col("variable2").is_not_null()
    )
)

What’s actually happening

Polars expects filter to receive a single boolean Series per row. When you call all() without collapsing, it expands into one expression per selected column. That gives the filter multiple predicates instead of one, which is ambiguous. The fix is to collapse those per-column checks into a single boolean column before passing it to filter.

The solution

Use a horizontal reduction to combine per-column predicates into one column. The expression all_horizontal(...) returns a single boolean per row indicating whether all provided expressions are true. Wrap the null checks with it and pass the result to filter:

>>> tbl.filter(ps.all_horizontal(ps.col("*").is_not_null()))
shape: (3, 3)
┌─────┬───────────┬───────────┐
│ id  ┆ variable1 ┆ variable2 │
│ --- ┆ ---       ┆ ---       │
│ i64 ┆ i64       ┆ i64       │
╞═════╪═══════════╪═══════════╡
│ 1   ┆ 15        ┆ 40        │
│ 3   ┆ 5         ┆ 50        │
│ 4   ┆ 10        ┆ 10        │
└─────┴───────────┴───────────┘

This produces exactly the desired result while remaining scalable for wide DataFrames.

Why this matters

Data preparation pipelines often need to exclude rows with missing values across arbitrary sets of columns. Writing one predicate per column quickly becomes unmaintainable as schemas evolve. Horizontal reductions provide a concise, robust way to collapse many checks into a single, valid filter predicate. For context, there is an issue open to allow writing patterns like df.filter(pl.all().is_not_null()) directly, but the reliable approach today is to explicitly collapse using an expression such as all_horizontal.

Takeaways

If a filter expression unintentionally expands into multiple column-wise predicates, reduce them horizontally so that filter receives one boolean Series per row. For the null-filtering case across all columns, the practical pattern is to combine column-wise is_not_null() checks with all_horizontal. It scales cleanly, keeps your intent explicit, and avoids ambiguity errors.