2025, Sep 25 05:00

Find names with a unique class and their counts in pandas using groupby, agg, and nunique

Learn a concise pandas pattern to return names with exactly one class and the count per name-class pair using groupby, agg, nunique, and count. Fast and clear.

When you need to find all names that map to exactly one value in a categorical column and also show how many times each name–class pair occurs, the most direct approach in pandas is to aggregate and filter in one pass. Below is a compact pattern that avoids manual post-processing and delivers both the representative value and its frequency.

Problem setup

We have a dataset with repeated names and their classes. The task is to return only those names for which the class is unique across all their rows, and to include a column with the count of occurrences for that name–class. A single appearance of a name also qualifies as unique.

Reproducible example

The following code builds the DataFrame and shows a first attempt that enumerates unique classes per name:

import pandas as pd
records = {
    'name':  ['Nick', 'Jane', 'Jacon', 'Jack', 'Cooze', 'Nick', 'Jane', 'Jacon', 'Jack', 'Cooze', 'John'],
    'class': ['a',    'b',     'a',    'b',    'a',     'b',    'b',    'c',     'a',    'a',     'a']
}
frame = pd.DataFrame(records)
by_name_unique = frame.groupby('name')['class'].unique()
print(by_name_unique)

This yields the unique class values for each name, but it still leaves you to compute lengths and filter manually, which is error-prone and unnecessary.

What’s really going on

The call to .unique() returns arrays of distinct values per group. To decide whether a name has exactly one class, the intent is to check how many distinct classes each name has. That’s precisely what .nunique() does. Additionally, you want the frequency for that name–class pair, which is simply .count() within the same group. Combining these aggregations avoids multiple passes and awkward array-length checks.

Solution

Aggregate, filter on the distinct-class count, and keep the relevant columns. The snippet below returns only names with a single class along with that class value and its frequency. Names that appear once are included because their distinct-class count is 1.

result = (
    frame.groupby('name')['class']
         .agg([('class', 'first'), 'nunique', 'count'])
         .query('nunique == 1')
         .drop(columns='nunique')
)
print(result)

Expected output with the data above shows entries where each name has exactly one class, plus how many times it appears:

       class  count
name                
Cooze      a      2
Jane       b      2
John       a      1

Why this matters

Relying on .nunique() and .agg() is both clearer and faster than assembling arrays and then computing their lengths. It keeps the logic declarative: group once, compute exactly what you need, filter, and present the result. This pattern scales better to larger datasets because it avoids Python-level loops or list handling and lets pandas perform vectorized operations under the hood.

Takeaway

Whenever you need “unique within group” checks alongside counts in pandas, skip intermediate .unique() arrays. Go straight to .groupby(...).agg(...) with nunique for distinctness and count for frequency, filter on nunique == 1, and keep the columns you care about. It is concise, explicit, and robust for datasets where names may repeat with different class values.

The article is based on a question from StackOverflow by user6781 and an answer by wjandrea.