https://pytroubles.com/en/posts/id2280-pandas-guide-extract-days-online-before-outages-using-1-to-0-transitions-and-handle-trailing-runs

Pandas guide: extract days online before outages using 1-to-0 transitions and handle trailing runs

Extract the final online day before outages in Pandas (1-to-0 edge) and capture trailing runs with a simple, robust pattern

Pandas guide: extract days online before outages using 1-to-0 transitions and handle trailing runs

Learn to extract the last online day before outages in Pandas by detecting 1-to-0 transitions and keeping the trailing run, with code for uptime counters.

2025-12-05T05:00:10+03:00

Counting how long equipment stays online between outages is straightforward once you have a binary status stream where 1 means running and 0 means offline. The tricky part is extracting the exact moment right before each outage together with the cumulative number of days online at that point. Below is a compact approach that does exactly that and handles the trailing run at the end of the dataset.Problem setupAssume you have a DataFrame with a binary status column. A common way to build a cumulative counter of days online between outages is to reuse groups broken by zeros and count positions within each group.import pandas as pd # Example of building a cumulative counter between outages (binary status) # 1 = online, 0 = outage run_log = pd.DataFrame({"status_bit": [1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1]}) # Cumulative counter of online stretches grouped by zeros run_log["uptime_ctr"] = ( run_log["status_bit"] .groupby(run_log["status_bit"].eq(0).cumsum()) .cumcount() ) For the extraction task, we will use a small dataset that mirrors the scenario and already contains the precomputed cumulative totals aligned with the dates. Values of 0 in the counter correspond to outage days.import pandas as pd sample = { "date": [ "5/29/2025", "5/31/2025", "6/1/2025", "6/2/2025", "6/3/2025", "6/4/2025", "6/5/2025", "6/6/2025", "6/7/2025", "6/8/2025", "6/9/2025", "6/10/2025" ], "status_bit": [1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1], "uptime_ctr": [1, 2, 3, 4, 0, 0, 0, 1, 2, 3, 4, 5] } events_df = pd.DataFrame(sample) events_df["date"] = pd.to_datetime(events_df["date"]) What exactly needs to be extracted and why this is non-trivialThe goal is to capture the record immediately before every outage, meaning the final day in each run segment of ones, and take the associated cumulative total from the counter column. In other words, we want the row where a 1 is followed by a 0. This is an edge-based selection problem: we are not interested in the outage rows themselves, only the preceding online rows that close each run. There is one more edge case to account for. If the dataset ends with a run of ones and never drops to zero, that trailing run should also be included because it represents the latest completed stretch of online days in the data snapshot.SolutionIdentify indices where the current row is 1 and the next row is 0, then select those rows and keep the date plus the cumulative counter. Finally, if the last row is still 1, append it to include the trailing run.# Find indices where a run of 1s transitions to 0 (the day before an outage) cut_idx = events_df[ (events_df["status_bit"] == 1) & (events_df["status_bit"].shift(-1) == 0) ].index # Select the date and cumulative counter at these boundaries summary_df = ( events_df.loc[cut_idx, ["date", "uptime_ctr"]] .rename(columns={"uptime_ctr": "DaysOnline"}) ) # Include the last row if the series ends with 1 (a trailing online run) if events_df["status_bit"].iloc[-1] == 1: tail = ( events_df.iloc[[-1]][["date", "uptime_ctr"]] .rename(columns={"uptime_ctr": "DaysOnline"}) ) summary_df = pd.concat([summary_df, tail]) print(summary_df) Why this worksThe extraction hinges on detecting the exact transition from 1 to 0. By filtering rows where the current status is 1 and the next status is 0, you capture the boundary that closes a continuous online segment. Selecting the cumulative counter at those rows gives the length of each just-finished online run. Checking the last row ensures a run at the end of the dataset is not silently dropped when there is no following outage to mark its boundary.Why it mattersEdge-based selections like this are a frequent source of off-by-one errors. Capturing the row before an outage instead of the outage itself preserves the intended definition of “days online between outages.” Including a trailing run matters when datasets are snapshots in time and may end while equipment is still running.TakeawaysUse a clean 1-to-0 transition filter to pinpoint the last online day before each outage and read the cumulative counter right there. Always handle the tail case explicitly so you do not lose the final continuous run when the series ends with 1. Keeping these two details in mind helps avoid boundary mistakes and yields a precise summary of online stretches.

pandas, python, uptime between outages, cumulative counter, binary status, 1-to-0 transition, trailing run, edge-based selection, DataFrame, shift(-1), groupby, cumsum, cumcount, online days

2025

2025, Dec 05 05:00

Extract the final online day before outages in Pandas (1-to-0 edge) and capture trailing runs with a simple, robust pattern

Learn to extract the last online day before outages in Pandas by detecting 1-to-0 transitions and keeping the trailing run, with code for uptime counters.

Problem setup

Assume you have a DataFrame with a binary status column. A common way to build a cumulative counter of days online between outages is to reuse groups broken by zeros and count positions within each group.

import pandas as pd
# Example of building a cumulative counter between outages (binary status)
# 1 = online, 0 = outage
run_log = pd.DataFrame({"status_bit": [1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1]})
# Cumulative counter of online stretches grouped by zeros
run_log["uptime_ctr"] = (
    run_log["status_bit"]
        .groupby(run_log["status_bit"].eq(0).cumsum())
        .cumcount()
)

For the extraction task, we will use a small dataset that mirrors the scenario and already contains the precomputed cumulative totals aligned with the dates. Values of 0 in the counter correspond to outage days.

import pandas as pd
sample = {
    "date": [
        "5/29/2025", "5/31/2025", "6/1/2025", "6/2/2025",
        "6/3/2025", "6/4/2025", "6/5/2025", "6/6/2025",
        "6/7/2025", "6/8/2025", "6/9/2025", "6/10/2025"
    ],
    "status_bit": [1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1],
    "uptime_ctr": [1, 2, 3, 4, 0, 0, 0, 1, 2, 3, 4, 5]
}
events_df = pd.DataFrame(sample)
events_df["date"] = pd.to_datetime(events_df["date"])

What exactly needs to be extracted and why this is non-trivial

The goal is to capture the record immediately before every outage, meaning the final day in each run segment of ones, and take the associated cumulative total from the counter column. In other words, we want the row where a 1 is followed by a 0. This is an edge-based selection problem: we are not interested in the outage rows themselves, only the preceding online rows that close each run. There is one more edge case to account for. If the dataset ends with a run of ones and never drops to zero, that trailing run should also be included because it represents the latest completed stretch of online days in the data snapshot.

Solution

Identify indices where the current row is 1 and the next row is 0, then select those rows and keep the date plus the cumulative counter. Finally, if the last row is still 1, append it to include the trailing run.

# Find indices where a run of 1s transitions to 0 (the day before an outage)
cut_idx = events_df[
    (events_df["status_bit"] == 1) & (events_df["status_bit"].shift(-1) == 0)
].index
# Select the date and cumulative counter at these boundaries
summary_df = (
    events_df.loc[cut_idx, ["date", "uptime_ctr"]]
             .rename(columns={"uptime_ctr": "DaysOnline"})
)
# Include the last row if the series ends with 1 (a trailing online run)
if events_df["status_bit"].iloc[-1] == 1:
    tail = (
        events_df.iloc[[-1]][["date", "uptime_ctr"]]
                 .rename(columns={"uptime_ctr": "DaysOnline"})
    )
    summary_df = pd.concat([summary_df, tail])
print(summary_df)

Why this works

The extraction hinges on detecting the exact transition from 1 to 0. By filtering rows where the current status is 1 and the next status is 0, you capture the boundary that closes a continuous online segment. Selecting the cumulative counter at those rows gives the length of each just-finished online run. Checking the last row ensures a run at the end of the dataset is not silently dropped when there is no following outage to mark its boundary.

Why it matters

Edge-based selections like this are a frequent source of off-by-one errors. Capturing the row before an outage instead of the outage itself preserves the intended definition of “days online between outages.” Including a trailing run matters when datasets are snapshots in time and may end while equipment is still running.

Takeaways

Use a clean 1-to-0 transition filter to pinpoint the last online day before each outage and read the cumulative counter right there. Always handle the tail case explicitly so you do not lose the final continuous run when the series ends with 1. Keeping these two details in mind helps avoid boundary mistakes and yields a precise summary of online stretches.

pandas python