2025, Nov 28 11:00

Normalize a mixed Delivery Period column in pandas: turn 'start to end' ranges into one row per month (Month Year)

Pandas how-to: normalize a mixed Delivery Period column. Detect date ranges, generate monthly periods, explode to rows, and format as Month Year. Fast.

Normalize a mixed “Delivery Period” column: expand a single string range like “2025-01-01 to 2025-12-31” into one row per month and keep existing month-year values intact. Below is a compact approach in pandas that detects a range, generates monthly periods, explodes them into rows, and formats them to the expected “Month Year”.

Problem

Some rows store the delivery window as a date range string instead of the required “Month Year”. The goal is to find those rows and turn each into multiple rows, one for every month in the range, while preserving the other columns as-is.

Reproducible example

The input data contains one yearly range and several already normalized month-year values.

import pandas as pd
seed_cols = {
    'Price Curve': [
        'TEST',
        'some curve',
        'some curve',
        'some curve',
        'some curve',
        'some curve',
        'some curve',
        'some curve'
    ],
    'Text': [
        'TEST',
        'some text',
        'some text',
        'some text',
        'some text',
        'some text',
        'some text',
        'some text'
    ],
    'Delivery Period': [
        '2025-01-01 to 2025-12-31',
        'June 2025',
        'July 2025',
        'August 2025',
        'September 2025',
        'October 2025',
        'November 2025',
        'December 2025'
    ]
}
frame = pd.DataFrame(seed_cols)

What’s going on and why

The column mixes two representations of time: a single “start to end” string spanning multiple months, and individual “Month Year” strings. To align the data model, the range needs to be turned into a monthly sequence and expanded into separate rows. The simplest reliable signal is the substring separating the range boundaries.

Solution

The strategy is to detect rows that contain a range, split the boundaries, build a monthly sequence, explode the sequence to get one row per month, format the result to “Month Year”, and then combine it back into the original column.

# Identify rows that contain a date span
has_span = frame['Delivery Period'].str.contains(' to ')
# Split the textual span into start/end and generate monthly periods
frame.loc[has_span, 'span_seq'] = frame.loc[has_span, 'Delivery Period'].str.split(' to ')
frame.loc[has_span, 'span_seq'] = frame.loc[has_span, 'span_seq'].transform(
    lambda pair: pd.period_range(pair[0], pair[1], freq='M')
)
# Expand one row per period and format to "Month Year"
frame = frame.explode('span_seq')
frame['span_seq'] = frame['span_seq'].dt.strftime('%B %Y')
# Prefer the expanded value where present, otherwise keep the original
frame['Delivery Period'] = frame['span_seq'].combine_first(frame['Delivery Period'])
# Clean up the temporary helper column
frame = frame.drop(columns='span_seq')

Why this matters

Consistent time semantics in a “Delivery Period” field make downstream work straightforward. When each month has its own row, further processing aligns with the expected format and mirrors the sample output, avoiding ambiguity from a single wide range string.

Takeaways

Detect the range with a lightweight check, convert it to a monthly sequence, and expand rows. The combination of period generation, explode, and final formatting yields a clean, uniform “Month Year” column without disturbing rows that were already correctly formatted.