2025, Dec 25 15:00

Fixing slow spaCy embeddings in Polars UDFs: avoid expression-level unnest, use frame-level unnest for speed

Learn why Polars UDFs make spaCy embeddings 100× slower when using expression-level unnest, and how frame-level unnest restores performance with clean code.

Speed-trapping UDFs in Polars: why your spaCy vectors run 100× slower and how to fix it

Expanding spaCy embeddings into a wide table sounds straightforward: apply nlp(<string>).vector to a text column, then fan out 300 Float64 values into their own columns. The surprise comes later, when the pipeline that feels trivial in Pandas-style code takes more than 11 seconds in Polars, even though a single spaCy call averages around 13 milliseconds. The query plan hints at the cause: the expression expands into hundreds of per-field operations, and the UDF is invoked far more times than expected.

Repro: vectorizing text and unnesting into 300 columns

The following example applies spaCy and expands the 300-dim vector into individual columns. The logic is simple, but the runtime balloons.

import spacy
import polars as pl
encoder = spacy.load('en_core_web_lg')
lf_words = pl.LazyFrame([["apple", "banana", "orange"]], schema=['term'])
vec_columns = ['dim_' + str(i) for i in range(300)]
lf_words = lf_words.with_columns(
    pl.col('term').map_elements(
        lambda t: tuple(encoder(t).vector), return_dtype=pl.List(pl.Float64)
    ).list.to_struct(fields=vec_columns).struct.unnest()
)
lf_words.collect()

The query plan reveals a pattern: the expression-level unnest expands into many separate expressions, one per field. That expansion interacts badly with UDFs.

What actually goes wrong

The root cause is how expression expansion works. At the expression level, unnest of a struct turns into multiple expressions, one for each field. Conceptually, something like pl.col("x").struct.unnest() becomes individual field lookups: pl.col("x").struct.field("a"), pl.col("x").struct.field("b"), pl.col("x").struct.field("c"), and so on. Normally this isn’t a problem because Polars applies common subexpression elimination (CSE) and caches repeated expressions.

However, UDFs are not eligible for caching. If the expanded expressions depend on a UDF, each field expression can trigger a fresh UDF call. With 300 dimensions, that’s hundreds of repeated evaluations. This behavior is illustrated here and discussed in the issue tracker: https://github.com/pola-rs/polars/issues/20260.

A minimal demonstration makes the repetition obvious. The UDF prints a marker each time it is called.

import polars as pl
def apply_fn(v):
    print("Hello")
    return v
frame = pl.DataFrame({"arr": [[1, 2, 3], [4, 5, 6]]})
frame.with_columns(
    pl.col('arr').map_elements(apply_fn, return_dtype=pl.List(pl.Int64))
      .list.to_struct(fields=['c1', 'c2', 'c3'])
      .struct.unnest()
)

Instead of once per row, the output shows multiple “Hello” prints, one per expanded field.

The fix: use the frame-level unnest

The solution is to avoid expression-level unnest when the upstream produces a struct via a UDF. First, create the struct and give it a name with alias. Then, unnest at the frame level so Polars evaluates the UDF once per row and then expands the already-materialized struct.

import polars as pl
def apply_fn(v):
    print("Hello")
    return v
frame = pl.DataFrame({"arr": [[1, 2, 3], [4, 5, 6]]})
frame.with_columns(
    pl.col('arr').map_elements(apply_fn, return_dtype=pl.List(pl.Int64))
      .list.to_struct(fields=['c1', 'c2', 'c3'])
      .alias('out_struct')
).unnest('out_struct')

Now the UDF is called once per element in arr, not once per expanded field.

Applying the same pattern to the spaCy vector case yields the intended behavior without the redundant work.

import spacy
import polars as pl
encoder = spacy.load('en_core_web_lg')
lf_words = pl.LazyFrame([["apple", "banana", "orange"]], schema=['term'])
vec_columns = ['dim_' + str(i) for i in range(300)]
lf_fixed = lf_words.with_columns(
    pl.col('term').map_elements(
        lambda t: tuple(encoder(t).vector), return_dtype=pl.List(pl.Float64)
    ).list.to_struct(fields=vec_columns).alias('vec_struct')
).unnest('vec_struct')
lf_fixed.collect()

Why this distinction matters

When you mix UDFs with expression expansion, small inefficiencies quickly compound. In this scenario, the query plan generates one expression per vector dimension, and because UDFs are not cached, each of those expressions can re-run the UDF. With spaCy calls averaging around 13 milliseconds, multiplying the call count by the number of dimensions is enough to push wall time into double-digit seconds. Switching to the frame-level unnest keeps the UDF evaluation count aligned with rows, not columns.

Takeaways

If your Polars pipeline expands a struct produced by a UDF, avoid expression-level unnest. Materialize the struct with alias and then unnest it at the frame level. This pattern prevents redundant evaluations and keeps vectorization-friendly code fast, even when you are turning high-dimensional outputs like embeddings into wide tables. When diagnosing similar slowdowns, check how the plan expands expressions and remember that UDFs don’t benefit from CSE.