2025, Dec 23 05:00

How to store a NumPy array in a pandas DataFrame cell without errors: explicit column-aligned assignment with Series or dict

Learn why pandas row assignment fails when mixing a NumPy array and a scalar, and fix it with explicit column alignment using Series or dict. See examples.

Placing a NumPy array into a single pandas cell looks trivial until a second column enters the picture. A simple row assignment that mixes a large array with a scalar value can unexpectedly fail, even though the same array works fine when the DataFrame has only one column. Below is a concise walkthrough of why this happens and how to make the assignment unambiguous.

Reproducing the issue

The following code tries to assign a NumPy array and a string into the same row across two columns. The first write succeeds; the second one raises an error.

import numpy as np
import pandas as pd

arr_a = np.random.rand(74, 8)
arr_b = np.random.rand(74, 8)

grid = pd.DataFrame(columns=["payload", "unit"])

grid.loc["band"] = [arr_a, "N/A"]
# Reassigning the same row with another array triggers the error
grid.loc["band"] = [arr_b, "N/A"]

The single-column variant, where you only store the array, does not exhibit this behavior because there is no ambiguity in how to distribute the values across columns.

What is going on

The failure is tied to how pandas interprets the right-hand side when assigning a row with a plain list. A NumPy array is itself a sequence, and so is the surrounding list that also carries a scalar. This ambiguity makes pandas attempt to interpret and normalize the dimensionality of what you pass in, which results in the following error during the second assignment:

ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (2,) + inhomogeneous part.

In short, pandas cannot process the mixed sequence as intended when assigning to multiple columns with a raw list.

Fix: make the mapping explicit

The resolution is to provide pandas with explicit column alignment. Use a pd.Series with the target column labels or pass a dictionary keyed by column name. This removes the ambiguity and tells pandas exactly where each piece of data goes.

import numpy as np
import pandas as pd

arr_a = np.random.rand(74, 8)
arr_b = np.random.rand(74, 8)

grid = pd.DataFrame(columns=["payload", "unit"])

# First way: Series with explicit index
grid.loc["band"] = pd.Series([arr_b, "N/A"], index=["payload", "unit"])

# Or second way: dictionary with explicit keys
# grid.loc["band"] = {"payload": arr_b, "unit": "N/A"}

Why this matters

When you put non-scalar objects like NumPy arrays into DataFrame cells alongside other columns, relying on positional lists leaves room for misinterpretation. Explicit alignment ensures that pandas places each value into the intended column without trying to infer shapes or convert nested sequences. It prevents the inhomogeneous shape error and makes your intent obvious to both the library and future readers of your code.

Takeaways

If you need to store a NumPy array in a pandas cell while also setting other columns in the same row, avoid passing a raw list. Provide a pd.Series with a matching index or a dictionary with column names. This way pandas aligns values deterministically and the assignment remains robust and readable.

numpy pandas python