2025, Dec 06 23:00
How to Create Per-Row Dictionary Columns in pandas DataFrame Without NaN: Reliable assign, apply, and list comprehension methods
Why pandas DataFrame.assign returns NaN for per-row dictionaries, and how to fix it with list comprehensions, apply, or unpacking in a cashflow example.
Adding a per-row dictionary to a pandas DataFrame looks trivial until you try to compute that dictionary from multiple columns and end up with NaN instead of objects. This guide walks through a compact example from finance (row-wise cashflow series), explains why the first attempt fails, and shows reliable patterns that produce the desired column of dicts.
Reproducing the issue
Consider a table of stocks with a few characteristics. The goal is to generate, for each row, a dictionary representing a cashflow series derived from selected columns.
import pandas as pd
stocks = pd.DataFrame(
{
"price": [100, 103, 240],
"Feat1": [1, 3, 3],
"Feat2": [5, 7, 1],
"Feat3": [1, 4, 6],
},
index=["Company A", "Company B", "Company C"],
)
# price Feat1 Feat2 Feat3
# Company A 100 1 5 1
# Company B 103 3 7 4
# Company C 240 3 1 6
def make_flows(a=1, b=2):
return {0: a, 0.5: b, 1: 7, 2: 8, 3: 9}
# First attempt: returns NaN in the new column
broken = stocks.assign(
flows=lambda frame: make_flows(
a=frame["Feat1"], b=frame["Feat3"]
)
)
The new column ends up filled with NaN instead of per-row dictionaries.
Why it happens
When using DataFrame.assign with a callable, pandas passes the entire DataFrame to that callable. In the failing expression, frame["Feat1"] and frame["Feat3"] are Series, not scalars. Feeding Series to a function that is meant to produce a single dictionary per row creates a single dictionary built from Series objects, which cannot be aligned into a one-value-per-row column. Pandas expects either a scalar to broadcast, or an array-like/Series of the same length as the DataFrame. A single dict is neither, so the assignment yields NaN.
Working solution patterns
The fix is to generate one dictionary per row. A straightforward way is a list comprehension that iterates over the index and pulls scalar values per row.
def make_flows(a=1, b=2):
return {0: a, 0.5: b, 1: 7, 2: 8, 3: 9}
fixed = stocks.assign(
flows=lambda frame: [
make_flows(a=frame.loc[idx, "Feat1"], b=frame.loc[idx, "Feat3"])
for idx in frame.index
]
)
Another concise approach uses parameter unpacking from a two-column NumPy array. This version pulls values row-by-row and unpacks them into the function:
fixed_unpack = stocks.assign(
flows=lambda frame: [
make_flows(*vals) for vals in stocks[["Feat1", "Feat2"]].values
]
)
There is also an option that stays in the pandas API surface and computes row-wise by design:
stocks["flows"] = stocks.apply(
lambda row: make_flows(a=row["Feat1"], b=row["Feat3"]), axis=1
)
Result
With the row-wise construction, each cell contains its own dictionary of cashflows derived from that row’s features.
# Example shape of the result for the first approach (Feat1 + Feat3)
# price Feat1 Feat2 Feat3 flows
# Company A 100 1 5 1 {0: 1, 0.5: 1, 1: 7, 2: 8, 3: 9}
# Company B 103 3 7 4 {0: 3, 0.5: 4, 1: 7, 2: 8, 3: 9}
# Company C 240 3 1 6 {0: 3, 0.5: 6, 1: 7, 2: 8, 3: 9}
Why this matters
Many real-world data engineering and quant workflows create structured Python objects per row, especially when downstream logic needs full objects rather than scalars. Understanding how assign, apply, and vectorized selection behave prevents subtle bugs where pandas silently misaligns shapes or returns NaN because it cannot map a single object across rows.
Takeaways
When building a column of dictionaries, ensure your expression returns exactly one dictionary per row. Use a list comprehension with .loc to fetch scalar values, parameter unpacking from a two-column array when it matches your function signature, or a row-wise apply with axis=1. All of these produce a clean object-dtype column that you can later iterate over or transform for further calculations.