https://pytroubles.com/en/posts/id2329-pandas-per-group-min-start-and-max-end-across-numbered-columns-via-wide-to-long-or-row-wise

Pandas: Per-group min start and max end across numbered columns via wide_to_long or row-wise

Per-group min and max from numbered start/end columns in pandas using wide_to_long or row-wise aggregation

Pandas: Per-group min start and max end across numbered columns via wide_to_long or row-wise

Learn how to get per-name min start and max end from numbered columns in pandas. Compare wide_to_long + groupby with a row-wise min/max approach, with code.

2025-12-07T17:00:10+03:00

When the same logical field is split across numbered columns, aggregations by group get tricky. A common case looks like two ranges per row, for example start_1/end_1 and start_2/end_2, and the goal is to compute, per name, the overall minimum start and maximum end across all those columns.Problem setupSuppose you have a table where each name has multiple ranges in a wide format. You need to extract the global start and end per name:start_1 end_1 start_2 end_2 name 100 200 300 400 ABC 100 200 300 400 ABC 150 250 300 400 ABC 300 200 300 900 DEF 50 200 300 1000 DEF The expected result keeps one row per name with its lowest start and highest end:start end name 100 400 ABC 50 1000 DEF Code example that mirrors the inputThe following snippet builds the same structure so you can reproduce the task end-to-end:import pandas as pd data_map = { 'start_1': [100, 100, 150, 300, 50], 'end_1': [200, 200, 250, 200, 200], 'start_2': [300, 300, 300, 300, 300], 'end_2': [400, 400, 400, 900, 1000], 'name': ['ABC', 'ABC', 'ABC', 'DEF', 'DEF'] } frame = pd.DataFrame(data_map) What’s really going onThe data is in a wide layout: two pairs of columns represent two ranges per row. To aggregate consistently per name, you either reshape to long form so that all starts and ends align in the same columns, or compute the row-wise min for all start_* columns and the row-wise max for all end_* columns, then aggregate by name.Solution 1: reshape with wide_to_long, then groupby-aggTurning the numbered columns into a long layout makes the aggregation straightforward. Once the data is long, grouping by name and taking the min of start and max of end yields the target result.result_set = ( pd.wide_to_long( frame.reset_index(), stubnames=['start', 'end'], i=['index', 'name'], j=' ', sep='_' ) .groupby('name') .agg(start=('start', 'min'), end=('end', 'max')) .reset_index() ) print(result_set) This prints: name start end 0 ABC 100 400 1 DEF 50 1000 Why this worksAfter reshaping, each original row contributes two records, one for the “_1” set and one for the “_2” set. That leaves a tidy table of starts and ends, aligned by name:print( pd.wide_to_long( frame.reset_index(), stubnames=['start', 'end'], i=['index', 'name'], j=' ', sep='_' ) ) start end index name 0 ABC 1 100 200 2 300 400 1 ABC 1 100 200 2 300 400 2 ABC 1 150 250 2 300 400 3 DEF 1 300 200 2 300 900 4 DEF 1 50 200 2 300 1000 Solution 2: compute row-wise min/max, then aggregateAnother approach is to derive two helper columns per row: the smallest among all start_* columns and the largest among all end_* columns. Once these are present, aggregating by name with min for start and max for end leads to the same result.final_view = ( frame.assign( start=frame.filter(like='start').min(axis=1), end=frame.filter(like='end').max(axis=1) ) .groupby('name') .agg(start=('start', 'min'), end=('end', 'max')) .reset_index() ) print(final_view) Output matches the goal: name start end 0 ABC 100 400 1 DEF 50 1000 To see the per-row helper values before grouping:print( frame.assign( start=frame.filter(like='start').min(axis=1), end=frame.filter(like='end').max(axis=1) ) ) start_1 end_1 start_2 end_2 name start end 0 100 200 300 400 ABC 100 400 1 100 200 300 400 ABC 100 400 2 150 250 300 400 ABC 150 400 3 300 200 300 900 DEF 300 900 4 50 200 300 1000 DEF 50 1000 Why this is worth knowingReal datasets often arrive in a wide format with numbered fields. Being able to normalize such structures quickly unlocks straightforward groupby operations. In pandas, the choice between reshaping to long or computing row-wise statistics depends on whether you need to keep or reuse the long form for further analysis. Both patterns are concise and scale well to similar cases with multiple suffixes.ConclusionIf you need per-group min and max across numbered start/end columns, reshaping with wide_to_long followed by a groupby aggregation gives a clean, explicit pipeline. When you only need the final per-group result and the columns follow a consistent naming convention, computing row-wise min and max with filter(like=...) and then aggregating is just as effective. Pick the route that matches the rest of your workflow and keep the column suffixes consistent to make these transformations predictable.

pandas, wide_to_long, groupby, row-wise aggregation, min max ranges, numbered columns, start_1 end_1, DataFrame reshape, tidy data, Python, per-group aggregation, start end columns

2025

2025, Dec 07 17:00

Per-group min and max from numbered start/end columns in pandas using wide_to_long or row-wise aggregation

Learn how to get per-name min start and max end from numbered columns in pandas. Compare wide_to_long + groupby with a row-wise min/max approach, with code.

Problem setup

Suppose you have a table where each name has multiple ranges in a wide format. You need to extract the global start and end per name:

start_1  end_1  start_2  end_2  name
 100      200    300      400    ABC
 100      200    300      400    ABC
 150      250    300      400    ABC
 300      200    300      900    DEF
 50       200    300      1000   DEF

The expected result keeps one row per name with its lowest start and highest end:

start  end  name
 100   400  ABC
 50    1000 DEF

Code example that mirrors the input

The following snippet builds the same structure so you can reproduce the task end-to-end:

import pandas as pd
data_map = {
    'start_1': [100, 100, 150, 300, 50],
    'end_1':   [200, 200, 250, 200, 200],
    'start_2': [300, 300, 300, 300, 300],
    'end_2':   [400, 400, 400, 900, 1000],
    'name':    ['ABC', 'ABC', 'ABC', 'DEF', 'DEF']
}
frame = pd.DataFrame(data_map)

What’s really going on

The data is in a wide layout: two pairs of columns represent two ranges per row. To aggregate consistently per name, you either reshape to long form so that all starts and ends align in the same columns, or compute the row-wise min for all start_* columns and the row-wise max for all end_* columns, then aggregate by name.

Solution 1: reshape with wide_to_long, then groupby-agg

Turning the numbered columns into a long layout makes the aggregation straightforward. Once the data is long, grouping by name and taking the min of start and max of end yields the target result.

result_set = (
    pd.wide_to_long(
        frame.reset_index(),
        stubnames=['start', 'end'],
        i=['index', 'name'],
        j=' ',
        sep='_'
    )
    .groupby('name')
    .agg(start=('start', 'min'), end=('end', 'max'))
    .reset_index()
)
print(result_set)

This prints:

  name  start   end
0  ABC    100   400
1  DEF     50  1000

Why this works

After reshaping, each original row contributes two records, one for the “_1” set and one for the “_2” set. That leaves a tidy table of starts and ends, aligned by name:

print(
    pd.wide_to_long(
        frame.reset_index(),
        stubnames=['start', 'end'],
        i=['index', 'name'],
        j=' ',
        sep='_'
    )
)

              start   end
index name               
0     ABC  1    100   200
           2    300   400
1     ABC  1    100   200
           2    300   400
2     ABC  1    150   250
           2    300   400
3     DEF  1    300   200
           2    300   900
4     DEF  1     50   200
           2    300  1000

Solution 2: compute row-wise min/max, then aggregate

Another approach is to derive two helper columns per row: the smallest among all start_* columns and the largest among all end_* columns. Once these are present, aggregating by name with min for start and max for end leads to the same result.

final_view = (
    frame.assign(
        start=frame.filter(like='start').min(axis=1),
        end=frame.filter(like='end').max(axis=1)
    )
    .groupby('name')
    .agg(start=('start', 'min'), end=('end', 'max'))
    .reset_index()
)
print(final_view)

Output matches the goal:

  name  start   end
0  ABC    100   400
1  DEF     50  1000

To see the per-row helper values before grouping:

print(
    frame.assign(
        start=frame.filter(like='start').min(axis=1),
        end=frame.filter(like='end').max(axis=1)
    )
)

   start_1  end_1  start_2  end_2 name  start   end
0      100    200      300    400  ABC    100   400
1      100    200      300    400  ABC    100   400
2      150    250      300    400  ABC    150   400
3      300    200      300    900  DEF    300   900
4       50    200      300   1000  DEF     50  1000

Why this is worth knowing

Real datasets often arrive in a wide format with numbered fields. Being able to normalize such structures quickly unlocks straightforward groupby operations. In pandas, the choice between reshaping to long or computing row-wise statistics depends on whether you need to keep or reuse the long form for further analysis. Both patterns are concise and scale well to similar cases with multiple suffixes.

Conclusion

If you need per-group min and max across numbered start/end columns, reshaping with wide_to_long followed by a groupby aggregation gives a clean, explicit pipeline. When you only need the final per-group result and the columns follow a consistent naming convention, computing row-wise min and max with filter(like=...) and then aggregating is just as effective. Pick the route that matches the rest of your workflow and keep the column suffixes consistent to make these transformations predictable.

group-by pandas python