2025, Dec 07 17:00
Per-group min and max from numbered start/end columns in pandas using wide_to_long or row-wise aggregation
Learn how to get per-name min start and max end from numbered columns in pandas. Compare wide_to_long + groupby with a row-wise min/max approach, with code.
When the same logical field is split across numbered columns, aggregations by group get tricky. A common case looks like two ranges per row, for example start_1/end_1 and start_2/end_2, and the goal is to compute, per name, the overall minimum start and maximum end across all those columns.
Problem setup
Suppose you have a table where each name has multiple ranges in a wide format. You need to extract the global start and end per name:
start_1 end_1 start_2 end_2 name
100 200 300 400 ABC
100 200 300 400 ABC
150 250 300 400 ABC
300 200 300 900 DEF
50 200 300 1000 DEF
The expected result keeps one row per name with its lowest start and highest end:
start end name
100 400 ABC
50 1000 DEF
Code example that mirrors the input
The following snippet builds the same structure so you can reproduce the task end-to-end:
import pandas as pd
data_map = {
'start_1': [100, 100, 150, 300, 50],
'end_1': [200, 200, 250, 200, 200],
'start_2': [300, 300, 300, 300, 300],
'end_2': [400, 400, 400, 900, 1000],
'name': ['ABC', 'ABC', 'ABC', 'DEF', 'DEF']
}
frame = pd.DataFrame(data_map)
What’s really going on
The data is in a wide layout: two pairs of columns represent two ranges per row. To aggregate consistently per name, you either reshape to long form so that all starts and ends align in the same columns, or compute the row-wise min for all start_* columns and the row-wise max for all end_* columns, then aggregate by name.
Solution 1: reshape with wide_to_long, then groupby-agg
Turning the numbered columns into a long layout makes the aggregation straightforward. Once the data is long, grouping by name and taking the min of start and max of end yields the target result.
result_set = (
pd.wide_to_long(
frame.reset_index(),
stubnames=['start', 'end'],
i=['index', 'name'],
j=' ',
sep='_'
)
.groupby('name')
.agg(start=('start', 'min'), end=('end', 'max'))
.reset_index()
)
print(result_set)
This prints:
name start end
0 ABC 100 400
1 DEF 50 1000
Why this works
After reshaping, each original row contributes two records, one for the “_1” set and one for the “_2” set. That leaves a tidy table of starts and ends, aligned by name:
print(
pd.wide_to_long(
frame.reset_index(),
stubnames=['start', 'end'],
i=['index', 'name'],
j=' ',
sep='_'
)
)
start end
index name
0 ABC 1 100 200
2 300 400
1 ABC 1 100 200
2 300 400
2 ABC 1 150 250
2 300 400
3 DEF 1 300 200
2 300 900
4 DEF 1 50 200
2 300 1000
Solution 2: compute row-wise min/max, then aggregate
Another approach is to derive two helper columns per row: the smallest among all start_* columns and the largest among all end_* columns. Once these are present, aggregating by name with min for start and max for end leads to the same result.
final_view = (
frame.assign(
start=frame.filter(like='start').min(axis=1),
end=frame.filter(like='end').max(axis=1)
)
.groupby('name')
.agg(start=('start', 'min'), end=('end', 'max'))
.reset_index()
)
print(final_view)
Output matches the goal:
name start end
0 ABC 100 400
1 DEF 50 1000
To see the per-row helper values before grouping:
print(
frame.assign(
start=frame.filter(like='start').min(axis=1),
end=frame.filter(like='end').max(axis=1)
)
)
start_1 end_1 start_2 end_2 name start end
0 100 200 300 400 ABC 100 400
1 100 200 300 400 ABC 100 400
2 150 250 300 400 ABC 150 400
3 300 200 300 900 DEF 300 900
4 50 200 300 1000 DEF 50 1000
Why this is worth knowing
Real datasets often arrive in a wide format with numbered fields. Being able to normalize such structures quickly unlocks straightforward groupby operations. In pandas, the choice between reshaping to long or computing row-wise statistics depends on whether you need to keep or reuse the long form for further analysis. Both patterns are concise and scale well to similar cases with multiple suffixes.
Conclusion
If you need per-group min and max across numbered start/end columns, reshaping with wide_to_long followed by a groupby aggregation gives a clean, explicit pipeline. When you only need the final per-group result and the columns follow a consistent naming convention, computing row-wise min and max with filter(like=...) and then aggregating is just as effective. Pick the route that matches the rest of your workflow and keep the column suffixes consistent to make these transformations predictable.