2025, Nov 11 13:00

Group by date and let Matplotlib create figures on demand: plot multi-site time series with pandas, without hardcoding

Learn Python, pandas and Matplotlib to plot time-aligned data from multiple sites: group by date, auto-create figures, and avoid hardcoded indices easily.

When you compare time-aligned metrics from multiple sources, plotting becomes awkward if the number of dates varies and you try to pre-create figures manually. This often happens with data read from csv files, stored in nested dictionaries, where each site contributes a DataFrame per day. The goal is simple: for each date, render a single figure that contains the lines from all sites for that date, without hardcoding how many dates exist.

Problem setup

The data is structured as a nested mapping of sites to daily DataFrames, and a straightforward plotting attempt ends up pre-allocating figures by index. That approach works only if you already know how many dates to expect, which is brittle and requires manual edits.

import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
meteo_store = {'Station_A':
               {'Mean': {'01-01': pd.DataFrame(np.random.rand(5,2)), '01-02': pd.DataFrame(np.random.rand(5,2)),
                         '01-03': pd.DataFrame(np.random.rand(5,2))},
                'Misc': {'01-01': 'dummy_data', '01-02': 'dummy_data'}},
               'Station_B':
               {'Mean': {'01-01': pd.DataFrame(np.random.rand(5,2)), '01-02': pd.DataFrame(np.random.rand(5,2)),
                         '01-03': pd.DataFrame(np.random.rand(5,2))},
                'Misc': {'01-01': 'dummy_data', '01-02': 'dummy_data'}}}
plt.figure(1)
plt.figure(2)
plt.figure(3)
for station, payload in meteo_store.items():
    fig_idx = 1
    mean_block = payload['Mean']
    for date_key, frame in mean_block.items():
        plt.figure(fig_idx)
        plt.plot(frame[0], frame[1], label=station)
        plt.legend()
        fig_idx += 1

Why this is fragile

The number of dates is unknown in advance, so pre-creating figures by index leads to constant maintenance. Even if you cap the range to a small number of days, it is still error-prone and clutters the code. There is also an implicit insight that helps here: you do not need to create figures upfront. Matplotlib will create a figure on demand when you call the figure constructor with an identifier.

You don’t need to pre-create figures. Calling plt.figure(i) creates the figure if it doesn’t already exist.

Instead of guessing how many figures you need, group the input by date first, then iterate over that structure. This way, each date becomes one figure, and each figure collects lines from all sites for that date.

Solution: group by date first, then plot

The idea is to build a mapping from date to all site DataFrames for that date. After that, loop over the mapping and build one figure per date. This avoids any manual figure bookkeeping and scales to any number of sites or dates.

import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
from collections import defaultdict
meteo_store = {'Station_A': 
               {'Mean': {'01-01': pd.DataFrame(np.random.rand(5,2)), '01-02': pd.DataFrame(np.random.rand(5,2)),
                         '01-03': pd.DataFrame(np.random.rand(5,2))},
                'Misc': {'01-01': 'dummy_data', '01-02': 'dummy_data'}},
               'Station_B':
               {'Mean': {'01-01': pd.DataFrame(np.random.rand(5,2)), '01-02': pd.DataFrame(np.random.rand(5,2)),
                         '01-03': pd.DataFrame(np.random.rand(5,2))},
                'Misc': {'01-01': 'dummy_data', '01-02': 'dummy_data'}}}
by_day = defaultdict(list)
for station_name, content in meteo_store.items():
    for date_key, df in content['Mean'].items():
        by_day[date_key].append((station_name, df))
for day_key, series in by_day.items():
    plt.figure(day_key)
    for st, dframe in series:
        plt.plot(dframe[0], dframe[1], label=st)
    plt.legend()
plt.show()

This produces one figure per date, each figure containing one line per site. The dates are discovered dynamically, and passing a date string to plt.figure uses that as the identifier, so you never need to count or predefine indices.

Why this approach matters

Dynamic grouping by date makes the code resilient to changing inputs. You avoid assumptions about the number of days and you get consistent comparison across sites for each date. Another practical benefit is that you can construct the per-date aggregation directly when ingesting csv files, bypassing the intermediate nested structure if you wish.

Takeaways

When plotting multi-source time-aligned data, do not create figures upfront. Aggregate records by the plotting key first, then let matplotlib instantiate figures on demand. Using a simple date-to-series mapping keeps the plotting loop concise, accommodates any number of sites, and eliminates manual figure management.

The article is based on a question from StackOverflow by Regina Phalange and an answer by bruno.