2025, Dec 08 03:00
Compute Time Differences Across Midnight in Polars: Parse 24:00 Safely and Normalize Overnight Durations
Learn a clean Polars approach to time arithmetic across midnight: parse 24:00 safely, compute time deltas, and normalize negative durations to overnight spans.
Calculating a time delta that crosses midnight in Polars looks simple until a value like “24:00” appears in the data. Polars does not accept “24:00” as a valid time literal, so a straightforward conversion to time breaks. If you replace “24:00” with “00:00” to make parsing work, subtracting end from start may produce a negative duration, which you then have to handle explicitly. Below is a practical walkthrough of the problem and a cleaner, more general solution.
Reproducing the issue
The dataset contains start and end times as strings. The first approach converts them to time, special-cases “24:00”, then computes an hour delta by manually working with integer representations.
import polars as pl
data_frame = pl.DataFrame(
{
"begin": [
"23:00",
"00:00"
],
"finish": [
"24:00",
"01:00"
]
}
)
(
data_frame
.with_columns(
begin = pl.col("begin").str.to_time("%H:%M"),
finish = pl.col("finish").replace("24:00", "00:00").str.to_time("%H:%M")
)
.with_columns(
span = (
pl.when(pl.col("finish") == pl.time(0, 0, 0))
.then(86400000000000)
.otherwise(pl.col("finish").cast(pl.Int64))
- pl.col("begin").cast(pl.Int64)
) / 3600000000000
)
)
What’s really going on
There are two separate hurdles. The first is parsing: Polars can’t parse “24:00”, so it must be replaced by “00:00” before conversion. The second is arithmetic: once everything is parsed, subtracting an end time that wrapped to “00:00” from a late-night start time yields a negative duration. That negative number does represent crossing midnight, but it needs to be normalized to a positive span within a 24-hour day. The initial approach handles both, but it does so by converting time to integers and juggling manual constants, which makes the intent harder to read and reason about.
A cleaner, more general approach with expressions
A more readable pattern is to compute the raw difference as an expression, fix the parsing for “24:00” by mapping it to “00:00”, and then add 24 hours when the result is negative. This keeps the logic close to how you’d describe the problem: compute the delta, and if it’s negative, roll it forward by a day.
import polars as pl
tbl = pl.DataFrame(
{"begin": ["23:00", "00:00", "23:30"],
"finish": ["24:00", "01:00", "00:35"]}
)
gap = pl.col("finish") - pl.col("begin")
result = (
tbl.with_columns(
begin=pl.col("begin").str.to_time("%H:%M"),
finish=pl.col("finish").replace("24:00", "00:00").str.to_time("%H:%M"),
)
.with_columns(
span=pl.when(gap < 0)
.then(pl.duration(hours=24) + gap)
.otherwise(gap)
)
)
print(result)
The output shows that spans crossing midnight are properly normalized, including a case that would trip up the more manual approach.
shape: (3, 3)
┌────────────┬──────────┬──────────────┐
│ start_time ┆ end_time ┆ duration │
│ --- ┆ --- ┆ --- │
│ time ┆ time ┆ duration[μs] │
╞════════════╪══════════╪══════════════╡
│ 23:00:00 ┆ 00:00:00 ┆ 1h │
│ 00:00:00 ┆ 01:00:00 ┆ 1h │
│ 23:30:00 ┆ 00:35:00 ┆ 1h 5m │
└────────────┴──────────┴──────────────┘
Why this matters
Time arithmetic across midnight is a common pitfall. Building the transformation as a clear Polars expression makes the intent explicit, keeps the query declarative, and avoids coupling your logic to low-level integer casts or hardcoded constants. The replacement of “24:00” with “00:00” stays as a simple parsing step, while the normalization rule for negative deltas neatly captures all overnight spans.
Takeaways
When dealing with durations over midnight in Polars, parse times consistently, compute the raw difference as an expression, and normalize negative spans by adding 24 hours. This approach reads naturally, scales to more rows without additional branching, and keeps the transformation focused on the actual business logic rather than manual unit conversions.