2026, Jan 05 19:00

Reshape Xarray time-lon-lat data into a 4D grid: year and month-day as coordinates via MultiIndex unstack

Reshape Xarray time-lon-lat data into a 4D DataArray: split time into year and month-day via pandas MultiIndex and unstack, keeping values intact safely.

Transforming a time–lon–lat cube into a truly four-dimensional structure is a common need when downstream logic expects time split into separate coordinates. In this case, the target layout is year and month-day as independent axes, turning a 3D DataArray into year × monthly × lon × lat. The goal is to do that safely and predictably without changing any scientific values.

Minimal example that shows the setup

The snippet below creates daily data for 2000–2001, removes Feb 29 to keep a uniform day set, and builds a time–lon–lat DataArray. It mirrors the situation where time is a single coordinate containing full datetimes.

import xarray as xr
import numpy as np
import pandas as pd

t_range = pd.date_range("2000-01-01", "2001-12-31", freq="D")
t_range = t_range[~((t_range.month == 2) & (t_range.day == 29))]

x_coords = np.linspace(100, 110, 5)
y_coords = np.linspace(30, 35, 4)
vals = np.random.rand(len(t_range), len(x_coords), len(y_coords))

field = xr.DataArray(
    vals,
    coords={"time": t_range, "lon": x_coords, "lat": y_coords},
    dims=["time", "lon", "lat"],
    name="pr"
)

What’s actually the problem

The array is three-dimensional, but the processing target is a four-dimensional grid where time is split into two orthogonal coordinates. Specifically, the time axis needs to be decomposed into year and month-day. The desired result should expose year and monthly as independent dimensions, alongside lon and lat, without altering the underlying values or reshaping them inconsistently.

How to reshape time into year and month-day

The approach is to derive two arrays from the original datetime index, combine them into a pandas.MultiIndex, assign that MultiIndex back to the time coordinate, and then unstack the time coordinate. The unstack call replaces the single time axis with the two levels from the MultiIndex, yielding a 4D structure whose dimensions are year, monthly, lon, and lat. Below is a self-contained solution.

import xarray as xr
import numpy as np
import pandas as pd

t_range = pd.date_range("2000-01-01", "2001-12-31", freq="D")
t_range = t_range[~((t_range.month == 2) & (t_range.day == 29))]

x_coords = np.linspace(100, 110, 5)
y_coords = np.linspace(30, 35, 4)
vals = np.random.rand(len(t_range), len(x_coords), len(y_coords))

field = xr.DataArray(
    vals,
    coords={"time": t_range, "lon": x_coords, "lat": y_coords},
    dims=["time", "lon", "lat"],
    name="pr"
)

yr = field.time.dt.year.values
md = field.time.dt.strftime('%m-%d').values

u_yr = np.unique(yr)
u_md = np.unique(md)

tm_mi = pd.MultiIndex.from_arrays([yr, md], names=("year", "monthly"))

cube4d = field.copy()
cube4d.coords["time"] = tm_mi
cube4d = cube4d.unstack("time")

print(cube4d)

Why this works

Two parallel arrays are extracted from the datetime index: one for the calendar year and another for month-day formatted as mm-dd. These are used as the two levels of a MultiIndex that replaces the original time coordinate. When the time coordinate is unstacked, its levels are promoted to separate dimensions named year and monthly. The DataArray becomes four-dimensional with axes year, monthly, lon, and lat, and the values maintain their original order because the MultiIndex comes directly from the original time sequence.

About .first and .last in the original attempt

If calling .first and .last produced errors in a groupby chain, the reshaping approach above makes them unnecessary. Unstacking after assigning a two-level coordinate constructs the 4D layout directly, so nothing needs to be aggregated or reduced.

Why it matters

Splitting time into multiple coordinates clarifies indexing, improves readability, and aligns the data to workflows that expect year and month-day as separate axes. This structure is convenient for selecting across any of the axes without extra parsing of datetimes, and it keeps the dataset consistent when date subsets must be addressed explicitly by year and by a uniform day key.

Takeaways

Promote the parts of a datetime coordinate that you intend to index by into explicit axes. Creating a MultiIndex from year and month-day and then unstacking the coordinate is a precise way to achieve a four-dimensional layout. It avoids ad hoc grouping and sidesteps errors tied to aggregations that are not required for the reshaping task.