2025, Dec 22 17:00

How to deduplicate pandas DataFrame rows when a column contains dicts (fix 'unhashable type: dict')

Learn how to use pandas drop_duplicates when a DataFrame column holds dicts. Fix 'unhashable type: dict' by stringifying, deduping rows, then restoring dicts

When a pandas DataFrame column contains dictionaries, built-in deduplication routines can stumble. A common symptom is the unhashable type: 'dict' error when calling drop_duplicates(), because dictionaries are mutable and therefore unhashable. If your goal is to keep only unique rows based on a column of dicts, you need a small detour that makes those values comparable.

Repro: a compact example that shows the issue

The data below yields a DataFrame where scalar columns repeat while the list of dicts becomes multiple rows. Notice how the dictionaries in the last column recur and need to be deduplicated.

import pandas as pd

payload = {
    "some_id": "xxx",
    "some_email": "abc.xyz@somedomain.com",
    "This is Sample": [
        {"a": "22", "b": "Y", "c": "33", "d": "x"},
        {"a": "44", "b": "N", "c": "55", "d": "Y"},
        {"a": "22", "b": "Y", "c": "33", "d": "x"},
        {"a": "44", "b": "N", "c": "55", "d": "Y"},
        {"a": "22", "b": "Y", "c": "33", "d": "x"},
        {"a": "44", "b": "N", "c": "55", "d": "Y"}
    ]
}

table = pd.DataFrame(payload)
print(table)

This produces repeated rows for the dict-valued column. Calling drop_duplicates() directly on such a DataFrame will fail because pandas cannot hash dictionaries to compare them.

What actually goes wrong and why

Deduplication relies on hashing to compare row values efficiently. Dictionaries in Python are mutable and therefore unhashable, so they cannot be used as-is in drop_duplicates(). That’s why the error unhashable type: 'dict' appears when a column contains dict objects.

The fix: make dicts comparable, dedupe, and restore

The straightforward approach is to transform the dictionaries into a hashable representation, perform deduplication, and then, if you need to keep them as dictionaries, convert them back. One simple technique is to stringify the dictionaries, use drop_duplicates(), and then parse the strings back into dicts.

import pandas as pd
import ast

payload = {
    "some_id": "xxx",
    "some_email": "abc.xyz@somedomain.com",
    "This is Sample": [
        {"a": "22", "b": "Y", "c": "33", "d": "x"},
        {"a": "44", "b": "N", "c": "55", "d": "Y"},
        {"a": "22", "b": "Y", "c": "33", "d": "x"},
        {"a": "44", "b": "N", "c": "55", "d": "Y"},
        {"a": "22", "b": "Y", "c": "33", "d": "x"},
        {"a": "44", "b": "N", "c": "55", "d": "Y"}
    ]
}

table = pd.DataFrame(payload)

work = table.copy()
work["This is Sample"] = work["This is Sample"].astype(str)

deduped = work.drop_duplicates()

deduped["This is Sample"] = deduped["This is Sample"].apply(ast.literal_eval)

print(deduped)

The resulting DataFrame contains the two unique rows you expect, with the dicts restored in their original form.

Why this matters for day-to-day data work

Nested or semi-structured data is common in analytics pipelines, logs, and API payloads. Understanding that pandas treats unhashable objects differently helps avoid confusing failures and makes the path to a correct solution straightforward. This pattern scales to larger datasets because the heavy lifting is still done by drop_duplicates(), only after converting values to a hashable representation.

Takeaways

If a DataFrame column holds dictionaries, don’t attempt to deduplicate directly. Convert the dicts into a stable, comparable form such as strings, run drop_duplicates(), and convert back only if you need dictionary objects downstream. This keeps the logic simple, avoids unhashable errors, and yields clean, unique rows ready for further processing.

dataframe drop duplicates pandas python