2025, Sep 29 21:00

Pandas resample and agg: per-column kwargs, min_count, and safe patterns for empty windows

Learn why pandas resample().agg(dict) ignores min_count, how to handle per-column kwargs, return None not 0 for empty windows, and the performance costs.

Resampling time series in pandas often looks straightforward until you need per-column control over aggregation behavior. A typical case: you want sum during resample, but if a bucket has no rows, you want None/NaN instead of 0. With a single aggregation this is easy; the gotcha appears the moment you switch to a dictionary of aggregations.

Problem setup

Consider a minimal input with a single timestamped index and one column. Asking for a resample with sum and min_count=1 produces the expected None for empty windows:

import pandas as pd
data_tbl = pd.DataFrame(index=[pd.to_datetime('2020-01-01')],
                        columns=['metric'])
ok_out = data_tbl.resample('5min').agg('sum', min_count=1)

Result:

           metric
2020-01-01    None

But switching to a per-column dict immediately loses the min_count semantics and returns 0:

bad_out = data_tbl.resample('5min').agg({'metric': 'sum'}, min_count=1)

Result:

           metric
2020-01-01       0

What actually happens and why

Passing keyword arguments like min_count alongside a string aggregator works because they are forwarded to the corresponding implementation of Resampler.sum. Once you provide a dictionary to agg, pandas does not forward those kwargs per key. In other words, you cannot use a dict mapping and also pass distinct kwargs to the underlying aggregation functions the way you might expect. This behavior is currently not supported; there is/was a similar issue reported for agg.

Workable patterns

When you need control over kwargs while resampling, there are a few patterns that reliably preserve behavior without relying on unsupported kwargs forwarding.

If you want the same aggregation for several columns, slice the columns before calling agg. This keeps min_count=1 working as intended:

data_tbl = pd.DataFrame(index=[pd.to_datetime('2020-01-01')],
                        columns=['metric', 'metric2', 'metric3'])
same_res = data_tbl.resample('5min')[['metric', 'metric2']].agg('sum', min_count=1)

Result:

           metric metric2
2020-01-01   None    None

If you need different aggregations per column, assemble the result with concat. Build one resampler, apply column-wise aggregations with their kwargs, and concatenate along columns:

agg_plan = {'metric': 'sum', 'metric2': 'min'}
rs_obj = data_tbl.resample('5min')
mixed_res = pd.concat({col: rs_obj[col].agg([fn], min_count=1)
                       for col, fn in agg_plan.items()}, axis=1)

Result:

           metric metric2
              sum     min
2020-01-01   None     NaN

If both the functions and their kwargs vary by column, keep a per-column kwargs mapping and apply it during the concat:

agg_plan = {'metric': 'sum', 'metric2': 'min'}
kw_per_col = {'metric2': {'min_count': 1}}
rs_obj = data_tbl.resample('5min')
varkw_res = pd.concat({col: rs_obj[col].agg([fn], **kw_per_col.get(col, {}))
                       for col, fn in agg_plan.items()}, axis=1)

Result:

           metric metric2
              sum     min
2020-01-01      0     NaN

In this last case, only metric2 received min_count=1, so metric fell back to the default sum behavior that yields 0 for an empty period.

Why this matters

Zeros versus None is not a cosmetic detail. In time series pipelines, downstream logic—ratios, rolling stats, anomaly flags—behaves differently depending on whether an empty window is treated as no data or as an actual zero. Ensuring predictable aggregation semantics avoids silent drift in metrics.

There is also a performance angle worth being aware of. Using a dict of aggregations already introduces overhead compared to a single-aggregator call. On a representative input with 10K rows and 3 columns and with the resampler precomputed, r.agg('sum') measured around 402 μs ± 40.6 μs, the dictionary form r.agg({'value': 'sum', 'value2': 'sum', 'value3': 'sum'}) around 1.46 ms ± 112 μs, while the concat approach measured about 2.15 ms ± 60.4 μs. This gives a sense of the relative cost when you need per-column control.

Takeaways

If you require the same aggregation and the same kwargs across several columns, group them into a single agg call over a sliced view; this keeps the code compact and avoids extra concatenation. When aggregations or kwargs differ per column, build the result column-wise and concatenate. Expect a performance impact compared to a single-aggregator call; the dict-based approach already costs more, and the concat pattern adds a bit on top. If performance becomes critical, prepare a reproducible slice of your workload and measure each option in your environment.

Until kwargs forwarding for dict-based agg is supported, these patterns provide predictable, explicit control over resample behavior without sacrificing correctness.

The article is based on a question from StackOverflow by KamiKimi 3 and an answer by mozway.