2025, Dec 06 23:00

How to Create Per-Row Dictionary Columns in pandas DataFrame Without NaN: Reliable assign, apply, and list comprehension methods

Why pandas DataFrame.assign returns NaN for per-row dictionaries, and how to fix it with list comprehensions, apply, or unpacking in a cashflow example.

Adding a per-row dictionary to a pandas DataFrame looks trivial until you try to compute that dictionary from multiple columns and end up with NaN instead of objects. This guide walks through a compact example from finance (row-wise cashflow series), explains why the first attempt fails, and shows reliable patterns that produce the desired column of dicts.

Reproducing the issue

Consider a table of stocks with a few characteristics. The goal is to generate, for each row, a dictionary representing a cashflow series derived from selected columns.

import pandas as pd
stocks = pd.DataFrame(
    {
        "price": [100, 103, 240],
        "Feat1": [1, 3, 3],
        "Feat2": [5, 7, 1],
        "Feat3": [1, 4, 6],
    },
    index=["Company A", "Company B", "Company C"],
)
#             price  Feat1  Feat2  Feat3
# Company A     100      1      5      1
# Company B     103      3      7      4
# Company C     240      3      1      6
def make_flows(a=1, b=2):
    return {0: a, 0.5: b, 1: 7, 2: 8, 3: 9}
# First attempt: returns NaN in the new column
broken = stocks.assign(
    flows=lambda frame: make_flows(
        a=frame["Feat1"], b=frame["Feat3"]
    )
)

The new column ends up filled with NaN instead of per-row dictionaries.

Why it happens

When using DataFrame.assign with a callable, pandas passes the entire DataFrame to that callable. In the failing expression, frame["Feat1"] and frame["Feat3"] are Series, not scalars. Feeding Series to a function that is meant to produce a single dictionary per row creates a single dictionary built from Series objects, which cannot be aligned into a one-value-per-row column. Pandas expects either a scalar to broadcast, or an array-like/Series of the same length as the DataFrame. A single dict is neither, so the assignment yields NaN.

Working solution patterns

The fix is to generate one dictionary per row. A straightforward way is a list comprehension that iterates over the index and pulls scalar values per row.

def make_flows(a=1, b=2):
    return {0: a, 0.5: b, 1: 7, 2: 8, 3: 9}
fixed = stocks.assign(
    flows=lambda frame: [
        make_flows(a=frame.loc[idx, "Feat1"], b=frame.loc[idx, "Feat3"]) 
        for idx in frame.index
    ]
)

Another concise approach uses parameter unpacking from a two-column NumPy array. This version pulls values row-by-row and unpacks them into the function:

fixed_unpack = stocks.assign(
    flows=lambda frame: [
        make_flows(*vals) for vals in stocks[["Feat1", "Feat2"]].values
    ]
)

There is also an option that stays in the pandas API surface and computes row-wise by design:

stocks["flows"] = stocks.apply(
    lambda row: make_flows(a=row["Feat1"], b=row["Feat3"]), axis=1
)

Result

With the row-wise construction, each cell contains its own dictionary of cashflows derived from that row’s features.

# Example shape of the result for the first approach (Feat1 + Feat3)
#              price  Feat1  Feat2  Feat3                         flows
# Company A      100      1      5      1  {0: 1, 0.5: 1, 1: 7, 2: 8, 3: 9}
# Company B      103      3      7      4  {0: 3, 0.5: 4, 1: 7, 2: 8, 3: 9}
# Company C      240      3      1      6  {0: 3, 0.5: 6, 1: 7, 2: 8, 3: 9}

Why this matters

Many real-world data engineering and quant workflows create structured Python objects per row, especially when downstream logic needs full objects rather than scalars. Understanding how assign, apply, and vectorized selection behave prevents subtle bugs where pandas silently misaligns shapes or returns NaN because it cannot map a single object across rows.

Takeaways

When building a column of dictionaries, ensure your expression returns exactly one dictionary per row. Use a list comprehension with .loc to fetch scalar values, parameter unpacking from a two-column array when it matches your function signature, or a row-wise apply with axis=1. All of these produce a clean object-dtype column that you can later iterate over or transform for further calculations.