2025, Nov 03 13:00

How to store xarray.Dataset objects per row in a pandas DataFrame without conversion errors using df.at

Learn how to store per-row xarray.Dataset in a pandas DataFrame without NumPy coercion: initialize an object dtype column and assign with df.at for safe storage

When you need to keep rich objects like xarray.Dataset alongside tabular metadata in pandas, a straightforward column assignment can backfire. Pandas often tries to coerce values to NumPy arrays, which doesn’t play nicely with complex Python objects. The good news: you can store per-row xarray datasets safely, as long as you assign them the right way.

The setup

Suppose you have a DataFrame with metadata and want to attach a unique xarray.Dataset to each row. A naive assignment may trigger unwanted conversion. Here’s a minimal example reflecting that pattern:

import pandas as pd
import xarray as xr
import numpy as np

# Create a DataFrame
tbl = pd.DataFrame({"id": [1, 2, 3]})

# Initialize an empty column with dtype=object
tbl["xr_payload"] = pd.Series(dtype=object)

for ridx in tbl.index:
    # Create a unique xarray Dataset for each row
    pack = xr.Dataset({
        "temp": xr.DataArray(np.random.rand(2)),
        "press": xr.DataArray(np.random.rand(2))
    })
    # Attempt assignment using .loc
    tbl.loc[ridx, "xr_payload"] = pack

What’s going on

The core frustration is that pandas may try to convert an xarray.Dataset into a NumPy array during assignment, which fails. Even if the column uses object dtype, the assignment method matters. Assigning with label-based indexing that isn’t strictly scalar-safe can invite that conversion behavior.

The fix

Use scalar access with df.at to place the xarray.Dataset directly into a single cell, and ensure the column is object-typed from the start. This avoids pandas attempting to coerce the value.

import pandas as pd
import xarray as xr
import numpy as np

# Create a DataFrame
tbl = pd.DataFrame({"id": [1, 2, 3]})

# Initialize an empty column with dtype=object
tbl["xr_payload"] = pd.Series(dtype=object)

for ridx in tbl.index:
    # Create a unique xarray Dataset for each row
    pack = xr.Dataset({
        "temp": xr.DataArray(np.random.rand(2)),
        "press": xr.DataArray(np.random.rand(2))
    })
    # Assign using .at to place the Dataset directly into the cell
    tbl.at[ridx, "xr_payload"] = pack

# Inspect the stored object type
print(type(tbl.loc[0, "xr_payload"]))
# <class 'xarray.core.dataset.Dataset'>

Why this matters

Data workflows increasingly mix tabular metadata with domain objects from other libraries. Being able to store an xarray.Dataset per row can simplify indexing, batching, and later processing without juggling parallel structures. The small details—object dtype and the use of df.at—make the difference between stable storage and a confusing conversion error.

Conclusion

If you need to keep xarray.Dataset instances in a pandas DataFrame column, initialize the column with dtype=object and assign cell by cell using df.at. This keeps pandas from trying to coerce your Datasets into arrays and preserves exactly what you put in. It’s a tiny change in how you write the assignment, but it saves time and prevents subtle failures down the line.

The article is based on a question from StackOverflow by BlueScr33n and an answer by Polarimetric.

pandas python python-xarray