2025, Dec 16 03:00

How to Split a Multiline, Comma-Separated String Column into a Nested List in Polars (list.eval + pl.element)

Learn how to split a multiline, comma-separated string column into a list of lists in Polars using list.eval and pl.element, with indexing and casting tips.

When a column stores a multiline, comma-separated payload in a single string, you often want to split it into a nested list that is easy to index and reuse across the pipeline. In Polars, a single split gets you only halfway: you end up with a list of lines, not a list of lists. The trick is applying a second split inside each list element.

Minimal setup and the stumbling block

Consider a single-row DataFrame where one cell contains multiple CSV-like lines. A straightforward split by newline produces only the outer list, stopping before the comma step.

frame = pl.DataFrame({"payload": "A,B,C,1\nD,E,F,2\nG,H,I,3\nJ,K,L,4"})
attempt = frame.with_columns(
    pl.col("payload").str.split("\n")
)

This yields a list of lines, but not a list of lists of fields.

Why it happens

.str.split("\n") transforms the string into a List[str] where each element is a line. To split each line by comma, the operation must run at the element level inside that list. That is exactly what list.eval with pl.element() does. Also note that in this example there are mixed strings and numbers in a single row; Polars doesn’t support mixed types in a list, so the numbers must remain strings.

The fix with list.eval and element-level logic

Use a nested split: first by newline to get lines, then apply an element-level split by comma within list.eval. The result is a List[List[str]].

resolved = frame.with_columns(
    pl.col("payload")
      .str.split("\n")
      .list.eval(
          pl.element().str.split(",")
      )
)

Now each line is split into fields, and the column contains a list of lists ready for indexing or further transformation. If you plan to use these lists from other columns readily, you might want to convert to a struct column and unnest it to get flat columns.

Indexing into the nested list

If you need to pull out a specific value and cast it, you can index into the nested list and convert as needed. Below is a small demonstration that extracts the fourth field of the third inner list and casts it to Int64.

demo = pl.DataFrame(
    {"payload": [[
        ["A", "B", "C", "1"],
        ["D", "E", "F", "2"],
        ["G", "H", "I", "3"],
        ["J", "K", "L", "4"]
    ]]},
    strict=False
)
picked = demo.with_columns(
    pick_value = pl.col("payload").list[2].list[3].cast(pl.Int64)
)

Why this matters

String parsing pipelines in Polars often arrive as single-column blobs. Converting them into a proper nested structure early makes downstream transformations predictable and type-safe. Knowing that mixed-type lists aren’t supported prevents subtle schema issues; keeping numerics as strings or casting on extraction avoids surprises. When the end goal is tabular fields, turning the nested structure into a struct and unnesting helps expose clean, flat columns.

Takeaway

When splitting a column into a list of lists in Polars, think in two phases: split the outer layer, then evaluate an element-level split inside the list using list.eval and pl.element(). Keep in mind that lists cannot mix types, so cast selectively when extracting. If your next step is columnar operations, consider converting to a struct and unnesting to work with flat, well-typed columns.