2025, Dec 20 11:00

Remapping xarray DataArray from 2D lat/lon coordinates to 1D lat and lon on a regular grid with NaN gaps

Learn to convert xarray NetCDF data from 2D lat/lon coordinates to 1D lat and lon. Remap a DataArray to a grid with NaN gaps using stack and unique coords.

When working with NetCDF data in xarray, a common unification task is converting multidimensional coordinate arrays into standard 1D coordinates. The goal is to place values onto a regular lat/lon grid and leave gaps as NaN, so downstream logic can assume consistent axes.

Problem

Given a DataArray whose coordinates are defined as two-dimensional lat/lon arrays, convert it to a DataArray with 1D lat and lon coordinates and values reassigned to the corresponding grid cells.

import xarray as xr

arr0 = xr.DataArray(
    [[0, 1], [2, 3]],
    coords={
        "lon": (["ny", "nx"], [[30, 40], [40, 50]]),
        "lat": (["ny", "nx"], [[10, 10], [20, 20]]),
    },
    dims=["ny", "nx"],
)

Target representation with 1D coordinates and NaN where no point maps:

xr.DataArray(
    [[0, 1, np.nan], 
     [np.nan, 2, 3]],
    coords={
        "lat": [10, 20],
        "lon": [30, 40, 50],
    })

What’s going on and why it happens

The data holds values on a grid indexed by dimensions ny and nx, while the coordinates lon and lat are also provided as 2D arrays over the same dimensions. Many tools expect 1D lat and 1D lon axes to align data on a regular grid. To standardize the structure, values need to be remapped from the 2D coordinate pairs onto the unique sets of latitude and longitude values. Positions that do not exist in the original mapping should remain NaN.

Solution

A straightforward approach is to flatten the data into a list of points with xarray.DataArray.stack, compute the unique coordinate values, create an empty regular grid, and assign values back by coordinate lookup.

import xarray as xr
import numpy as np

arr0 = xr.DataArray(
    [[0, 1], [2, 3]],
    coords={
        "lon": (["ny", "nx"], [[30, 40], [40, 50]]),
        "lat": (["ny", "nx"], [[10, 10], [20, 20]]),
    },
    dims=["ny", "nx"],
)

# Flatten ny/nx into a single index
stacked = arr0.stack(idx=("ny", "nx"))

# Collect unique coordinate values
lat_unique = np.unique(arr0.lat.values)
lon_unique = np.unique(arr0.lon.values)

# Prepare a regular grid filled with NaN
regular = xr.DataArray(
    np.full((len(lat_unique), len(lon_unique)), np.nan),
    coords={"lat": lat_unique, "lon": lon_unique},
    dims=["lat", "lon"]
)

# Map each original point to its 1D lat/lon position
for k in range(stacked.size):
    y = float(stacked.lat.values[k])
    x = float(stacked.lon.values[k])
    v = stacked.values[k]
    regular.loc[dict(lat=y, lon=x)] = v

print(regular)

Output:

<xarray.DataArray (lat: 2, lon: 3)> Size: 48B
array([[ 0.,  1., nan],
       [nan,  2.,  3.]])
Coordinates:
  * lat      (lat) int64 16B 10 20
  * lon      (lon) int64 24B 30 40 50

Why this matters

Standard 1D lat and lon axes simplify unification of datasets that otherwise differ only in their coordinate representation. After remapping, indexing by lat/lon becomes straightforward, and missing combinations remain explicit as NaN, which makes subsequent processing more predictable.

Takeaways

When a DataArray stores coordinates as 2D arrays over its spatial dimensions, stack the source grid into point form, extract unique latitude and longitude values, allocate a regular grid, and reassign data by coordinate lookup. This keeps the data aligned on standard axes and preserves the original values where coordinates match, leaving non-existing cells as NaN.