2025, Dec 01 07:00

Serialize Matplotlib violin plot artists to speed up post hoc styling of multi-panel figures

Learn how to serialize Matplotlib violin plot artists with NumPy to speed up styling of multi-panel figures from large netCDF4 datasets - save, reload and tweak later.

When you build multi-panel violin plots against large netCDF4 model datasets, it’s natural to want a fast feedback loop for layout and styling without reprocessing gigabytes of data. The straightforward idea—save the violin artists to disk and reload them later for styling—does work, but only if you save and reload them in a way that preserves how Matplotlib attaches those artists to figures and axes.

Problem overview

The workflow parses multi-gigabyte, time–latitude–longitude fields, computes region-wise selections, and renders many violins per panel. A single run can take hours. The goal is to serialize the created violin plot objects, then apply figure tweaks later (margins, fonts, colors, labels) without recomputing the underlying statistics or reloading the big data. A naive attempt to call numpy.save on each violin container from Matplotlib led to an object that prints like a dictionary of Matplotlib collections, but isn’t immediately usable across runs, and layout tweaks no longer work as expected.

Minimal code that mirrors the issue

The pattern below saves per-panel violin containers individually. Reloading them later won’t reattach them to a common figure, which blocks coherent, post hoc customization across panels.

import matplotlib.pyplot as plt
import numpy as np
# Create a multi-panel figure
canvas, ax_stack = plt.subplots(2, 1, figsize=(4.8, 6.4))
for axis in ax_stack:
    block = np.random.normal(size=(10, 3))
    vpkg = axis.violinplot(block)
    # Saving a single panel's violins in isolation
    np.save('panel_0.npy', vpkg)
# At this point, later styling in a different process won't know
# how to treat these as parts of the same figure.

What’s really happening

The objects returned by Matplotlib’s violinplot are artists that belong to a specific figure and axes. If you save them panel by panel, each reload yields collections detached from a shared figure context. That’s why customizations across the full layout don’t apply coherently after reloading. There are two practical implications. First, you need to serialize all violin containers together so that you can iterate and style them consistently after loading. Second, the reloaded artists belong to a new figure instance, not the original one you drew before saving; this is expected. If you don’t want to see both versions when calling plt.show, explicitly close the original figure.

A working approach: save all violin artists together, then reload and style

The snippet below demonstrates a pattern that works with Matplotlib v3.10.3 and numpy v2.3.1. It builds a figure, collects the violin containers into a single numpy array with dtype=object, saves them once, reloads with allow_pickle=True, and then applies color and titling on the reloaded artists. The same technique scales to more panels and larger figures.

import matplotlib.pyplot as plt
import numpy as np
# Build a figure with two stacked axes
fig_box, grid_axes = plt.subplots(nrows=2, figsize=(4.8, 6.4))
# Object array to hold all violin containers from each axes
artist_bundle = np.empty((2,), dtype=object)
# Draw and collect
for idx, axh in enumerate(grid_axes):
    matrix = np.random.normal(size=(10, 3))
    vpack = axh.violinplot(matrix)
    artist_bundle[idx] = vpack
# Persist all violins together
np.save('all_violins.npy', artist_bundle)
# Optionally hide the original figure in the same session
# plt.close(fig_box)
# --- Later, in a separate styling run ---
loaded = np.load('all_violins.npy', allow_pickle=True)
# Post hoc styling: recolor and set titles on the reloaded artists
for bloc, tint, label in zip(loaded, ['tab:purple', 'tab:pink'], ['foo', 'bar']):
    for key, coll in bloc.items():
        if key == 'bodies':
            for poly in coll:
                poly.set_facecolor(tint)
        else:
            coll.set_color(tint)
    # Recover the axes via any artist's .axes attribute
    axh = bloc['bodies'][0].axes
    axh.set_title(label.title())
# Recover the new figure from one of the axes and save
final_fig = axh.figure
final_fig.savefig('violin_layout.png')
plt.show()

This achieves the desired decoupling: the heavy numerical work runs once, you serialize the artist containers, and subsequent styling sessions only load and customize those artists. Everything after the save can be executed independently. If you prefer not to display the original and reloaded figures simultaneously, close the original with plt.close(fig_box) before the second phase.

Why this matters

When every datapoint counts and you cannot truncate or resample, the rendering step is inevitably expensive. Separating plotting from styling gives you fast turnaround for layout choices while retaining fidelity to the full dataset. Capturing all violin containers together preserves cross-panel context so that legends, colors, and titles can be applied coherently after a reload.

Closing thoughts

For large, multi-panel statistical plots, treat the Matplotlib artists as the serialization boundary. Save them together in a single object array, reload with allow_pickle, access components through the dictionary-like interface returned by violinplot, and recover the axes via any body’s axes attribute. If you keep the artists grouped, you can iterate quickly on whitespace, fonts, and color choices without paying the cost of recomputing the distributions each time.