2025, Sep 25 17:00

Fixing Datashader memory spikes with Dask-backed xarray: avoid compute() during shading of large arrays

Learn why Datashader calls compute() on Dask xarray, causing memory spikes, and how a patch (works with Datashader 0.16.1) restores chunked shading for arrays.

Rendering massive arrays with Datashader looks straightforward on paper: chunk the data with Dask, rasterize into a small canvas, and let the scheduler do the heavy lifting. In practice, you might still see memory spikes and long runtimes. A typical case is a dask-backed xarray at the scale of 150k × 90k with 8192 × 8192 chunks on a 100 GB, 16‑core Windows VM, where initiating a plot quickly pushes RAM toward the limit. The stack trace reveals a call to compute(), which explains the memory blow-up. Here’s what’s going on and how to address it.

Reproducing the issue

The minimal example below constructs a large dask.array, wraps it in xarray, and sends it through Datashader’s Canvas.raster followed by shade.

# imports
import numpy as np
import dask.array as dk
import datashader as dz
from datashader import transfer_functions as tfun
import xarray as xa

# create a large dask-backed array
grid = dk.random.random((100000, 100000), chunks=(1000, 1000))

# wrap as xarray DataArray
xr_view = xa.DataArray(
grid,
dims=["u", "v"],
coords={"u": np.arange(100000), "v": np.arange(100000)},
name="sample_vals"
)

# attempt to render
tfun.shade(dz.Canvas(plot_height=300, plot_width=300).raster(xr_view))

What actually happens and why memory spikes

The key detail appears in the traceback: Datashader’s shade path calls an internal function that does data = data.compute() when it detects a dask-backed array. That line forces materialization of the dask array into memory. With arrays on the order of 1010 elements, this immediately explains the steady RAM growth and high CPU usage: you are no longer operating chunk-by-chunk for the shading step, but instead computing the full array in one go.

Even though dask supports chunked execution, the specific shading code path shown by the stack trace bypasses lazy evaluation by explicitly invoking compute(). The observed behavior follows directly from that call.

Fix: try the Datashader patch

A targeted fix is available. Apply the change from the pull request at https://github.com/holoviz/datashader/pull/1448 and verify whether it resolves the issue in your environment. Reports indicate that using the latest Datashader release may lead to a different error related to the @ngjit decorator after applying the change, while using Datashader 0.16.1 together with that patch resolves the original problem.

Using the same code after the fix

The usage stays the same; the change is in the library, not in the calling code. You can continue to render with Datashader as before:

# imports
import numpy as np
import dask.array as dk
import datashader as dz
from datashader import transfer_functions as tfun
import xarray as xa

# construct dask-backed array
grid = dk.random.random((100000, 100000), chunks=(1000, 1000))

# wrap for xarray/Datashader
xr_view = xa.DataArray(
grid,
dims=["u", "v"],
coords={"u": np.arange(100000), "v": np.arange(100000)},
name="sample_vals"
)

# render
tfun.shade(dz.Canvas(plot_height=300, plot_width=300).raster(xr_view))

If you encounter the @ngjit decorator error on the latest version, note that there is a report of this behavior. The same change applied on Datashader 0.16.1 has been confirmed to fix the original memory issue.

Why this matters

Large-scale visualization pipelines often rely on lazy execution and chunking to keep memory usage bounded. When a library layer calls compute() on a dask-backed object, it can defeat that strategy and trigger full materialization. Understanding when that happens is critical to predict resource usage and to avoid surprises on production machines or constrained environments.

Takeaways

If you see Datashader ramp up memory while shading a dask-backed xarray and the traceback shows a compute() call, align your environment with the fix from the referenced pull request. If the latest Datashader release surfaces an @ngjit-related error after applying the change, test with Datashader 0.16.1 where the same change has been observed to resolve the problem, and follow the discussion on the pull request for updates. This way you preserve the intended benefits of using dask-backed data in your visualization workflow without unexpected memory blow-ups.

The article is based on a question from StackOverflow by Nanoputian and an answer by James A. Bednar.