2025, Dec 07 17:00

Per-group min and max from numbered start/end columns in pandas using wide_to_long or row-wise aggregation

Learn how to get per-name min start and max end from numbered columns in pandas. Compare wide_to_long + groupby with a row-wise min/max approach, with code.

When the same logical field is split across numbered columns, aggregations by group get tricky. A common case looks like two ranges per row, for example start_1/end_1 and start_2/end_2, and the goal is to compute, per name, the overall minimum start and maximum end across all those columns.

Problem setup

Suppose you have a table where each name has multiple ranges in a wide format. You need to extract the global start and end per name:

start_1  end_1  start_2  end_2  name
 100      200    300      400    ABC
 100      200    300      400    ABC
 150      250    300      400    ABC
 300      200    300      900    DEF
 50       200    300      1000   DEF

The expected result keeps one row per name with its lowest start and highest end:

start  end  name
 100   400  ABC
 50    1000 DEF

Code example that mirrors the input

The following snippet builds the same structure so you can reproduce the task end-to-end:

import pandas as pd
data_map = {
    'start_1': [100, 100, 150, 300, 50],
    'end_1':   [200, 200, 250, 200, 200],
    'start_2': [300, 300, 300, 300, 300],
    'end_2':   [400, 400, 400, 900, 1000],
    'name':    ['ABC', 'ABC', 'ABC', 'DEF', 'DEF']
}
frame = pd.DataFrame(data_map)

What’s really going on

The data is in a wide layout: two pairs of columns represent two ranges per row. To aggregate consistently per name, you either reshape to long form so that all starts and ends align in the same columns, or compute the row-wise min for all start_* columns and the row-wise max for all end_* columns, then aggregate by name.

Solution 1: reshape with wide_to_long, then groupby-agg

Turning the numbered columns into a long layout makes the aggregation straightforward. Once the data is long, grouping by name and taking the min of start and max of end yields the target result.

result_set = (
    pd.wide_to_long(
        frame.reset_index(),
        stubnames=['start', 'end'],
        i=['index', 'name'],
        j=' ',
        sep='_'
    )
    .groupby('name')
    .agg(start=('start', 'min'), end=('end', 'max'))
    .reset_index()
)
print(result_set)

This prints:

  name  start   end
0  ABC    100   400
1  DEF     50  1000

Why this works

After reshaping, each original row contributes two records, one for the “_1” set and one for the “_2” set. That leaves a tidy table of starts and ends, aligned by name:

print(
    pd.wide_to_long(
        frame.reset_index(),
        stubnames=['start', 'end'],
        i=['index', 'name'],
        j=' ',
        sep='_'
    )
)
              start   end
index name               
0     ABC  1    100   200
           2    300   400
1     ABC  1    100   200
           2    300   400
2     ABC  1    150   250
           2    300   400
3     DEF  1    300   200
           2    300   900
4     DEF  1     50   200
           2    300  1000

Solution 2: compute row-wise min/max, then aggregate

Another approach is to derive two helper columns per row: the smallest among all start_* columns and the largest among all end_* columns. Once these are present, aggregating by name with min for start and max for end leads to the same result.

final_view = (
    frame.assign(
        start=frame.filter(like='start').min(axis=1),
        end=frame.filter(like='end').max(axis=1)
    )
    .groupby('name')
    .agg(start=('start', 'min'), end=('end', 'max'))
    .reset_index()
)
print(final_view)

Output matches the goal:

  name  start   end
0  ABC    100   400
1  DEF     50  1000

To see the per-row helper values before grouping:

print(
    frame.assign(
        start=frame.filter(like='start').min(axis=1),
        end=frame.filter(like='end').max(axis=1)
    )
)
   start_1  end_1  start_2  end_2 name  start   end
0      100    200      300    400  ABC    100   400
1      100    200      300    400  ABC    100   400
2      150    250      300    400  ABC    150   400
3      300    200      300    900  DEF    300   900
4       50    200      300   1000  DEF     50  1000

Why this is worth knowing

Real datasets often arrive in a wide format with numbered fields. Being able to normalize such structures quickly unlocks straightforward groupby operations. In pandas, the choice between reshaping to long or computing row-wise statistics depends on whether you need to keep or reuse the long form for further analysis. Both patterns are concise and scale well to similar cases with multiple suffixes.

Conclusion

If you need per-group min and max across numbered start/end columns, reshaping with wide_to_long followed by a groupby aggregation gives a clean, explicit pipeline. When you only need the final per-group result and the columns follow a consistent naming convention, computing row-wise min and max with filter(like=...) and then aggregating is just as effective. Pick the route that matches the rest of your workflow and keep the column suffixes consistent to make these transformations predictable.