https://pytroubles.com/en/posts/id43-pandas-groupby-collapse-sparse-weekly-hours-by-employee-using-name-and-emp-with-max-aggregation

pandas groupby: Collapse sparse weekly hours by employee using Name and Emp# with max aggregation

Collapse sparse timesheet rows in pandas: group by Name and Emp# and use max to combine weekly hours

pandas groupby: Collapse sparse weekly hours by employee using Name and Emp# with max aggregation

Learn how to use pandas groupby on Name and Emp# plus max to collapse sparse weekly hours into one row per employee. Ideal for timesheets and reporting.

2025-09-22T11:00:05+03:00

When time data is captured across multiple rows per employee, downstream reporting becomes awkward. A classic case is weekly hour totals where each row has a single non-zero week and the rest are zeros. The goal is to collapse those sparse rows into one record per employee, preserving the weekly values without writing custom pivot logic.Problem setupAssume you have a pandas DataFrame where each employee appears on several rows, and only one of Week1–Week4 is non-zero per row. The task is to merge all rows for the same person into a single row, keeping the non-zero values per week.import pandas as pd # 'records' is the incoming DataFrame with columns: # 'Emp#', 'Name', 'Week1', 'Week2', 'Week3', 'Week4' # Multiple rows per employee; weeks are mostly zeros except one non-zero per row. A common first attempt is to aggregate by name only, like this:# Incorrect grouping if employee identity relies on both Name and Emp# compact_wrong = records.groupby(['Name']).max() This does not reliably produce the expected shape when you need to retain both the textual name and the numeric identifier.What actually causes the issueThe aggregation key is incomplete. Grouping only by Name will combine all rows that share the same name, ignoring the employee number. If Name is not a guaranteed unique identifier, or if you simply want to keep the employee number in the result, grouping by Name alone is insufficient. You must include both Name and Emp# in the grouping key.There is a second nuance around the aggregator. Because zeros act as placeholders and each week has at most one non-zero entry per employee, using max across the grouped rows surfaces that non-zero value for each week without custom logic.SolutionGroup by both Name and Emp# and use max to collapse the sparse rows into a single record per employee-week.# Correct: group by both Name and Emp# and take max to surface non-zero weekly values collapsed = records.groupby(['Name', 'Emp#']).max() This produces a compact table with one row per employee. As an illustration, the aggregated result looks like this for the provided data:Name Emp# Week1 Week2 Week3 Week4 abc 6 0 45 0 45 anup 4 45 45 63 45 john 2 0 45 0 0 kumar 5 45 0 63 0 linda 3 45 0 63 0 mary 1 45 45 63 45 If your original “expected” table shows values that cannot be derived from the input rows, reconcile the source data first. Inconsistencies such as unexpected blanks or numbers not present in the input will not appear after a faithful groupby-based aggregation.Why this mattersRow-collapsing tasks like this show up in time tracking, billing, and operational analytics. Choosing the correct grouping keys is essential; missing one identifier silently merges distinct entities. Picking the right aggregator is just as important. In sparse-row patterns where zeros are placeholders and a single non-zero carries the signal, max is a concise and correct way to extract that signal for each column.TakeawaysUse a complete grouping key that reflects how entities are uniquely identified, in this case both Name and Emp#. Prefer simple, column-wise aggregations that match the data pattern; for sparse weekly data, max cleanly lifts the non-zero entries. If the aggregated output seems off, verify the source rows and expected values align, then re-run the groupby with the correct keys.

pandas groupby, collapse rows, sparse weekly hours, timesheet aggregation, Python DataFrame, group by Name and Emp#, max aggregation, merge rows per employee, weekly totals, reporting, billing

2025

2025, Sep 22 11:00

Collapse sparse timesheet rows in pandas: group by Name and Emp# and use max to combine weekly hours

Learn how to use pandas groupby on Name and Emp# plus max to collapse sparse weekly hours into one row per employee. Ideal for timesheets and reporting.

Problem setup

Assume you have a pandas DataFrame where each employee appears on several rows, and only one of Week1–Week4 is non-zero per row. The task is to merge all rows for the same person into a single row, keeping the non-zero values per week.

import pandas as pd
# 'records' is the incoming DataFrame with columns:
# 'Emp#', 'Name', 'Week1', 'Week2', 'Week3', 'Week4'
# Multiple rows per employee; weeks are mostly zeros except one non-zero per row.

A common first attempt is to aggregate by name only, like this:

# Incorrect grouping if employee identity relies on both Name and Emp#
compact_wrong = records.groupby(['Name']).max()

This does not reliably produce the expected shape when you need to retain both the textual name and the numeric identifier.

What actually causes the issue

The aggregation key is incomplete. Grouping only by Name will combine all rows that share the same name, ignoring the employee number. If Name is not a guaranteed unique identifier, or if you simply want to keep the employee number in the result, grouping by Name alone is insufficient. You must include both Name and Emp# in the grouping key.

There is a second nuance around the aggregator. Because zeros act as placeholders and each week has at most one non-zero entry per employee, using max across the grouped rows surfaces that non-zero value for each week without custom logic.

Solution

Group by both Name and Emp# and use max to collapse the sparse rows into a single record per employee-week.

# Correct: group by both Name and Emp# and take max to surface non-zero weekly values
collapsed = records.groupby(['Name', 'Emp#']).max()

This produces a compact table with one row per employee. As an illustration, the aggregated result looks like this for the provided data:

Name        Emp#    Week1   Week2   Week3   Week4
abc         6       0       45      0       45
anup        4       45      45      63      45
john        2       0       45      0       0
kumar       5       45      0       63      0
linda       3       45      0       63      0
mary        1       45      45      63      45

If your original “expected” table shows values that cannot be derived from the input rows, reconcile the source data first. Inconsistencies such as unexpected blanks or numbers not present in the input will not appear after a faithful groupby-based aggregation.

Why this matters

Row-collapsing tasks like this show up in time tracking, billing, and operational analytics. Choosing the correct grouping keys is essential; missing one identifier silently merges distinct entities. Picking the right aggregator is just as important. In sparse-row patterns where zeros are placeholders and a single non-zero carries the signal, max is a concise and correct way to extract that signal for each column.

Takeaways

Use a complete grouping key that reflects how entities are uniquely identified, in this case both Name and Emp#. Prefer simple, column-wise aggregations that match the data pattern; for sparse weekly data, max cleanly lifts the non-zero entries. If the aggregated output seems off, verify the source rows and expected values align, then re-run the groupby with the correct keys.

The article is based on a question from StackOverflow by Anupkumar Kasi and an answer by Bending Rodriguez.

pandas python