2025, Oct 01 03:00

Split a Pandas column on the last comma with str.rsplit and fix the ValueError during assignment

Split a Pandas column into two by the last comma using str.rsplit. Avoid ValueError Cannot set a DataFrame..., fix shape mismatch, and replace row-wise apply.

Splitting a single Pandas column into two can look deceptively simple, until a seemingly harmless assignment throws ValueError: Cannot set a DataFrame with multiple columns to the single column. Below is a concise walkthrough of the failure mode and a robust way to split on the last comma, even when there are multiple commas in the text.

Problem

There is a function that takes a dataframe with six columns and tries to split a text column into two new columns using row-wise apply. Inside the function it fails with a shape-mismatch error; outside the function it doesn’t. The core idea is to find a comma and slice left/right parts.

def build_frame():
    df_six = "some data"
    def left_of_comma(txt):
        ln = len(txt)
        rng = range(ln)
        sep = ","
        for i in rng:
            if txt[i] == sep:
                pos = i
        return txt[0:pos]
    def right_of_comma(txt):
        ln = len(txt)
        rng = range(ln)
        sep = ","
        for i in rng:
            if txt[i] == sep:
                pos = i
        return txt[pos+1:]
    df_seven["Column Val1"] = df_six.apply(lambda row: left_of_comma(row["Splittable column"]), axis=1)
    df_eight["Column Val2"] = df_seven.apply(lambda row: right_of_comma(row["Splittable column"]), axis=1)
    df_out = base_df_6c.drop("Splittable Colum", axis=1)
    return df_out

What actually goes wrong

The error Pandas raises means the right-hand side of the assignment resolves to a DataFrame instead of a one-dimensional Series. Once that happens, assigning it to a single column triggers the exception you see. Relying on row-wise apply plus custom string scanning makes the operation brittle. A simpler, vectorized approach avoids this class of shape mismatch entirely and deals correctly with rows that have more than one comma.

Solution: split on the last comma with str.rsplit

When there can be multiple commas, use right-split with n=1 to split only on the last comma. The first part becomes the explanation and the second part becomes the status. If there may be spaces after the comma, strip them.

df["Column Val1"] = df["Splittable column"].str.rsplit(",", n=1).str[0]
df["Column Val2"] = df["Splittable column"].str.rsplit(",", n=1).str[1]

If you need to trim spaces:

parts = df["Splittable column"].str.rsplit(",", n=1)
df["Column Val1"] = parts.str[0].str.strip()
df["Column Val2"] = parts.str[1].str.strip()

You can also create both columns in one go using expand=True and then strip:

df[["Column Val1", "Column Val2"]] = (
    df["Splittable column"].str.rsplit(",", n=1, expand=True)
      .apply(lambda s: s.str.strip())
)

End-to-end example

This dataset contains multiple commas per row and needs the last field to be the status, with everything before it as the reason.

import pandas as pd
sample = {
    "Index": ["1", "2", "3", "4"],
    "Splittable column": [
        ",FALSE",
        "Not Acceptable,Failed,NAN,FALSE",
        "Not acceptable,FALSE",
        ",FALSE",
    ],
}
df_in = pd.DataFrame(sample)
# Option 1: chain .str accessors
df_in["Column Val1"] = df_in["Splittable column"].str.rsplit(",", n=1).str[0].str.strip()
df_in["Column Val2"] = df_in["Splittable column"].str.rsplit(",", n=1).str[1].str.strip()
# Option 2: expand to two columns at once
# df_in[["Column Val1", "Column Val2"]] = (
#     df_in["Splittable column"].str.rsplit(",", n=1, expand=True)
#           .apply(lambda s: s.str.strip())
# )
# If needed, drop the original text column
# df_in.drop(columns="Splittable column", inplace=True)
print(df_in)

This keeps the last token such as FALSE in Column Val2, and preserves the entire left side, including intermediate commas, in Column Val1.

Why this matters

String splitting at scale is a common task, but assigning objects of the wrong dimensionality is an easy trap when using apply. Vectorized string methods such as str.rsplit with n=1 express the intent directly, handle rows with multiple commas consistently, and eliminate shape-related surprises during assignment. Adding str.strip() deals with extra whitespace that often appears after delimiters.

Takeaways

Prefer vectorized string operations over row-wise apply when splitting columns. Split on the last comma using str.rsplit(',', n=1) to handle inputs with multiple commas, and use str.strip() if spaces may trail the delimiter. If you ever run into a shape mismatch, inspect the right-hand side first by assigning it to a temporary variable and printing its type and shape; ensuring the data you pass into your function is explicit and consistent will also prevent unexpected behavior.

The article is based on a question from StackOverflow by Danylo Kuznetsov and an answer by furas.