https://pytroubles.com/en/posts/id460-pandas-multiindex-assign-a-per-day-maximum-to-a-single-08-30-row-using-mask-groupby-transform

pandas MultiIndex: Assign a per-day maximum to a single 08:30 row using mask, groupby, transform

How to assign a daily max to only the 08:30 row in a pandas MultiIndex DataFrame (no loops)

pandas MultiIndex: Assign a per-day maximum to a single 08:30 row using mask, groupby, transform

Learn to assign a per-day max to the single 08:30 row in a pandas MultiIndex DataFrame using boolean masks and groupby transform, avoiding broadcasting.

2025-10-04T11:00:05+03:00

Filling a per-day aggregate into a single row is a classic pandas gotcha: you compute the value correctly, but assignment silently broadcasts across all rows for that key. Below is a concise walkthrough of targeting exactly one row per (Symbol, Date) — the 08:30 record — and doing it consistently for a single contract or the entire dataset.Repro: why the value lands in every row of the dayConsider intraday option bars with a MultiIndex by Symbol and Date. The goal is to store the day’s max high in a column named day_high, but only on the row where hour equals 08:30:00.import pandas as pd import csv rows = [['SPXW 250715C06310000', '7/14/2025', 2.74, 2.87, 2.60, 2.65, 14, '8:30:00'], ['SPXW 250715C06310000', '7/14/2025', 2.80, 2.80, 2.50, 2.53, 61, '8:31:00'], ['SPXW 250715C06310000', '7/14/2025', 2.45, 2.45, 2.45, 2.45, 2, '8:32:00'], ['SPXW 250715C06310000', '7/14/2025', 2.58, 2.80, 2.58, 2.60, 32, '8:33:00'], ['SPXW 250715C06310000', '7/14/2025', 2.50, 2.50, 2.25, 2.30, 5, '8:34:00'], ['SPXW 250709C06345000', '7/9/2025', 0.05, 0.05, 0.03, 0.03, 246, '8:30:00'], ['SPXW 250709C06345000', '7/9/2025', 0.05, 0.10, 0.03, 0.07, 452, '8:31:00'], ['SPXW 250709C06345000', '7/9/2025', 0.07, 0.10, 0.05, 0.07, 137, '8:32:00'], ['SPXW 250709C06345000', '7/9/2025', 0.07, 0.07, 0.07, 0.07, 5, '8:33:00'], ['SPXW 250709C06345000', '7/9/2025', 0.07, 0.07, 0.05, 0.05, 225, '8:34:00'], ['SPXW 250715C06310000', '7/11/2025', 7.30, 7.30, 7.30, 7.30, 2, '8:30:00'], ['SPXW 250715C06310000', '7/11/2025', 7.20, 7.20, 7.20, 7.20, 2, '8:31:00'], ['SPXW 250715C06310000', '7/11/2025', 6.92, 6.92, 6.92, 6.92, 20, '8:32:00'], ['SPXW 250715C06310000', '7/11/2025', 6.58, 6.58, 6.58, 6.58, 1, '8:34:00'], ['SPXW 250715C06310000', '7/11/2025', 6.41, 6.41, 6.41, 6.41, 2, '8:35:00']] frame = pd.DataFrame(rows, columns=['Symbol', 'Date', 'open', 'high', 'low', 'close', 'volume', 'hour']) frame['Date'] = pd.to_datetime(frame['Date']) frame['hour'] = pd.to_datetime(frame['hour'], format='%H:%M:%S') frame = frame.set_index(['Symbol', 'Date']) # Attempt: fills every row of that (Symbol, Date) frame.loc[('SPXW 250715C06310000', '2025-07-14'), 'day_high'] = ( frame.loc[('SPXW 250715C06310000', '2025-07-14'), 'high'].max() ) What actually happensWhen you pass a two-level key to .loc on a MultiIndex, you address the entire sub-frame for that (Symbol, Date) pair. Assigning a scalar to that slice writes the same value to all matching rows. Because the code didn’t include a row-level filter on time, every 1-minute bar of that day for the selected contract was updated.Target a single row for a single contract/dayThe fix is to assign with a boolean mask that is true only for the row you want. You need two conditions: time is 08:30 and the MultiIndex equals the specific (Symbol, Date) pair.# Choose the one exact row using a boolean mask flag = ( frame['hour'].dt.strftime('%H:%M').eq('08:30') & (frame.index == ('SPXW 250715C06310000', pd.Timestamp('2025-07-14'))) ) frame.loc[flag, 'day_high'] = ( frame.loc[('SPXW 250715C06310000', '2025-07-14'), 'high'].max() ) If the Date level is already a datetime type, comparing with the bare date value also works; in that case, dropping pd.Timestamp for the equality check can be sufficient.Do it for every contract/day without loopsYou don’t need a for loop. Compute the per-(Symbol, Date) maximum once and let pandas align it to the right places. There are two idiomatic approaches.The first approach computes the daily maximum and assigns it only where hour is 08:30. The GroupBy result aligns on the (Symbol, Date) index during assignment.flag = frame['hour'].dt.strftime('%H:%M').eq('08:30') frame.loc[flag, 'day_high'] = frame.groupby(['Symbol', 'Date'])['high'].max() The second approach uses transform so the aggregated value is broadcast to the original index and then masked to keep only the 08:30 row. This variant is often the most straightforward to reason about because it is already aligned row-for-row.flag = frame['hour'].dt.strftime('%H:%M').eq('08:30') frame['day_high'] = frame.groupby(['Symbol', 'Date'])['high'].transform('max').where(flag) Why this mattersOn large intraday datasets, explicit boolean masks and vectorized groupby operations keep the code predictable and efficient. They also make the intent obvious: compute a daily aggregate, then write it exactly once per day into the canonical “opening” row at 08:30.TakeawaysWhen assigning into a MultiIndex slice, remember that a two-level key selects the entire group. Add a row-level mask to pinpoint the single record you care about. For bulk updates across all symbols and days, prefer GroupBy.max with index alignment or GroupBy.transform combined with where to avoid manual loops. And if your Date level is already a datetime, comparing to that value directly is sufficient for equality checks.

pandas MultiIndex, assign per-day max, 08:30 row, boolean mask, groupby, transform, day_high, slice assignment, avoid broadcasting, intraday data, Python DataFrame

2025

2025, Oct 04 11:00

How to assign a daily max to only the 08:30 row in a pandas MultiIndex DataFrame (no loops)

Learn to assign a per-day max to the single 08:30 row in a pandas MultiIndex DataFrame using boolean masks and groupby transform, avoiding broadcasting.

Repro: why the value lands in every row of the day

Consider intraday option bars with a MultiIndex by Symbol and Date. The goal is to store the day’s max high in a column named day_high, but only on the row where hour equals 08:30:00.

import pandas as pd
import csv
rows = [['SPXW 250715C06310000', '7/14/2025', 2.74, 2.87, 2.60, 2.65, 14, '8:30:00'],
        ['SPXW 250715C06310000', '7/14/2025', 2.80, 2.80, 2.50, 2.53, 61, '8:31:00'],
        ['SPXW 250715C06310000', '7/14/2025', 2.45, 2.45, 2.45, 2.45, 2, '8:32:00'],
        ['SPXW 250715C06310000', '7/14/2025', 2.58, 2.80, 2.58, 2.60, 32, '8:33:00'],
        ['SPXW 250715C06310000', '7/14/2025', 2.50, 2.50, 2.25, 2.30, 5, '8:34:00'],
        ['SPXW 250709C06345000', '7/9/2025', 0.05, 0.05, 0.03, 0.03, 246, '8:30:00'],
        ['SPXW 250709C06345000', '7/9/2025', 0.05, 0.10, 0.03, 0.07, 452, '8:31:00'],
        ['SPXW 250709C06345000', '7/9/2025', 0.07, 0.10, 0.05, 0.07, 137, '8:32:00'],
        ['SPXW 250709C06345000', '7/9/2025', 0.07, 0.07, 0.07, 0.07, 5, '8:33:00'],
        ['SPXW 250709C06345000', '7/9/2025', 0.07, 0.07, 0.05, 0.05, 225, '8:34:00'],
        ['SPXW 250715C06310000', '7/11/2025', 7.30, 7.30, 7.30, 7.30, 2, '8:30:00'],
        ['SPXW 250715C06310000', '7/11/2025', 7.20, 7.20, 7.20, 7.20, 2, '8:31:00'],
        ['SPXW 250715C06310000', '7/11/2025', 6.92, 6.92, 6.92, 6.92, 20, '8:32:00'],
        ['SPXW 250715C06310000', '7/11/2025', 6.58, 6.58, 6.58, 6.58, 1, '8:34:00'],
        ['SPXW 250715C06310000', '7/11/2025', 6.41, 6.41, 6.41, 6.41, 2, '8:35:00']]
frame = pd.DataFrame(rows, columns=['Symbol', 'Date', 'open', 'high', 'low', 'close', 'volume', 'hour'])
frame['Date'] = pd.to_datetime(frame['Date'])
frame['hour'] = pd.to_datetime(frame['hour'], format='%H:%M:%S')
frame = frame.set_index(['Symbol', 'Date'])
# Attempt: fills every row of that (Symbol, Date)
frame.loc[('SPXW 250715C06310000', '2025-07-14'), 'day_high'] = (
    frame.loc[('SPXW 250715C06310000', '2025-07-14'), 'high'].max()
)

What actually happens

When you pass a two-level key to .loc on a MultiIndex, you address the entire sub-frame for that (Symbol, Date) pair. Assigning a scalar to that slice writes the same value to all matching rows. Because the code didn’t include a row-level filter on time, every 1-minute bar of that day for the selected contract was updated.

Target a single row for a single contract/day

The fix is to assign with a boolean mask that is true only for the row you want. You need two conditions: time is 08:30 and the MultiIndex equals the specific (Symbol, Date) pair.

# Choose the one exact row using a boolean mask
flag = (
    frame['hour'].dt.strftime('%H:%M').eq('08:30') &
    (frame.index == ('SPXW 250715C06310000', pd.Timestamp('2025-07-14')))
)
frame.loc[flag, 'day_high'] = (
    frame.loc[('SPXW 250715C06310000', '2025-07-14'), 'high'].max()
)

If the Date level is already a datetime type, comparing with the bare date value also works; in that case, dropping pd.Timestamp for the equality check can be sufficient.

Do it for every contract/day without loops

You don’t need a for loop. Compute the per-(Symbol, Date) maximum once and let pandas align it to the right places. There are two idiomatic approaches.

The first approach computes the daily maximum and assigns it only where hour is 08:30. The GroupBy result aligns on the (Symbol, Date) index during assignment.

flag = frame['hour'].dt.strftime('%H:%M').eq('08:30')
frame.loc[flag, 'day_high'] = frame.groupby(['Symbol', 'Date'])['high'].max()

The second approach uses transform so the aggregated value is broadcast to the original index and then masked to keep only the 08:30 row. This variant is often the most straightforward to reason about because it is already aligned row-for-row.

flag = frame['hour'].dt.strftime('%H:%M').eq('08:30')
frame['day_high'] = frame.groupby(['Symbol', 'Date'])['high'].transform('max').where(flag)

Why this matters

On large intraday datasets, explicit boolean masks and vectorized groupby operations keep the code predictable and efficient. They also make the intent obvious: compute a daily aggregate, then write it exactly once per day into the canonical “opening” row at 08:30.

Takeaways

When assigning into a MultiIndex slice, remember that a two-level key selects the entire group. Add a row-level mask to pinpoint the single record you care about. For bulk updates across all symbols and days, prefer GroupBy.max with index alignment or GroupBy.transform combined with where to avoid manual loops. And if your Date level is already a datetime, comparing to that value directly is sufficient for equality checks.

The article is based on a question from StackOverflow by Dan and an answer by jezrael.

dataframe pandas python