https://pytroubles.com/en/posts/id2747-flatten-nested-json-in-a-pandas-dataframe-column-without-duplicating-rows-json-normalize-guide

Flatten Nested JSON in a pandas DataFrame Column Without Duplicating Rows (json_normalize Guide)

Flatten nested JSON in a pandas DataFrame column without explode: keep one row per record

Flatten Nested JSON in a pandas DataFrame Column Without Duplicating Rows (json_normalize Guide)

Flatten nested JSON in a pandas DataFrame column without duplicating rows. Use json_normalize, record_path and stack to keep one row per record safely.

2025-12-28T17:00:11+03:00

2025-12-28T17:00:12+03:00

Flatten nested JSON in a pandas column without duplicating rows is a common need when the payload is a list of dicts per row. The usual approach with explode and json_normalize turns each element of the list into additional rows, which is not always desirable. Below is a compact way to keep one row per original record while expanding the JSON into columns.Problem setupConsider a list of measurements where each item carries a value, units, a parameter object wrapped in a single-element list, and a name. This list sits inside a DataFrame column. The goal is to expand each row’s JSON horizontally, not vertically.import pandas as pd readings_blob = [ { "scalar": "43", "units": "m", "parameter": [{"no_1": "45", "no_2": "1038", "no_3": "356"}], "name": "Foo", }, { "scalar": "54.1", "units": "s", "parameter": [{"no_1": "78", "no_2": "103", "no_3": "356"}], "name": "Yoo", }, { "scalar": "1123.1", "units": "Hz", "parameter": [{"no_1": "21", "no_2": "43", "no_3": "3577"}], "name": "Baz", }, ] frame = pd.DataFrame( { "alpha": [11, 9, 23, 1], "beta": [7, 3, 1, 12], "data_json": [readings_blob, readings_blob, readings_blob, readings_blob], }, index=[0, 1, 2, 3], ) A straightforward attempt to normalize after explode duplicates rows because each row’s list has three elements. That produces three rows per original record:flat_fail = pd.json_normalize(frame["data_json"].explode()) Why it happensjson_normalize operates on sequences of records. After explode, each element of the list is treated as a separate record, so you get one row per element. In other words, it has no notion of the original row boundary unless you explicitly keep it. The inner parameter field is also a list holding a single dictionary, which requires explicit handling if you want to expose those nested keys.A compact fix that keeps one row per recordApply json_normalize to each cell of the JSON column and pivot the result into a Series with a two-level index: the first level is name, and the second level holds the other fields. This keeps the transformation per row and avoids row multiplication.series_wide = frame["data_json"].apply( lambda items: pd.json_normalize(items).set_index("name").stack() ) This produces, for each DataFrame row, a single wide Series with a MultiIndex like (name, field), e.g. ("Foo", "scalar"), ("Foo", "units"), ("Foo", "parameter"), and so on.Deeper normalization of the inner parameterIf you want to pull keys out of the single-element parameter list, specify record_path and meta in json_normalize. This exposes no_1, no_2, no_3 alongside scalar, units and name.series_wide_params = frame["data_json"].apply( lambda items: pd.json_normalize( data=items, record_path="parameter", meta=["scalar", "units", "name"], ).set_index("name").stack() ) The result is still a per-row Series with a two-level index, but now the parameter keys are flattened into the structure.Turn the Series into actual columnsTo merge the expanded JSON back into the original DataFrame as new columns, concatenate along columns. This keeps one row per original record and appends the normalized values to the right.out_wide = pd.concat( [ frame.drop(columns="data_json"), series_wide, ], axis="columns", ) If you prefer to keep an explicit top-level column grouping, first add a new top column level to the non-JSON part and then concatenate. This preserves a MultiIndex across the entire DataFrame.normalize_fn = lambda items: ( pd.json_normalize(items, "parameter", ["scalar", "units", "name"]) .set_index("name") .stack() ) lift_cols = lambda df_obj, level_name: pd.concat([df_obj], keys=[level_name], axis=1) out_multi = pd.concat( [ frame.drop(columns="data_json").pipe(lift_cols, "Main"), frame["data_json"].apply(normalize_fn), ], axis="columns", ) In both cases the JSON for each row is expanded horizontally instead of vertically. The latter variant preserves a MultiIndex for clear grouping by name and field.Why this mattersWorking with nested JSON in DataFrames often leads to accidental reshaping. When a column stores a list of dicts, explode plus json_normalize is convenient but destroys the one-record-per-row invariant. Applying normalization per row and stacking under a name-based index preserves row boundaries and lets you choose whether to keep a MultiIndex or project everything into flat columns later.TakeawaysNormalize the JSON at the row level instead of exploding globally. Use set_index("name").stack() to pivot fields into a tidy, column-friendly structure. When the JSON embeds a single dict inside a list, reach for record_path and meta to expose those inner keys. Finally, concatenate the per-row Series back to your table to keep one row per record while still gaining fully accessible columns.

pandas, flatten nested JSON, pandas DataFrame, json_normalize, record_path, explode, list of dicts, keep one row per record, stack, set_index, normalize per row, Python

2025

2025, Dec 28 17:00

Flatten nested JSON in a pandas DataFrame column without explode: keep one row per record

Flatten nested JSON in a pandas DataFrame column without duplicating rows. Use json_normalize, record_path and stack to keep one row per record safely.

Problem setup

Consider a list of measurements where each item carries a value, units, a parameter object wrapped in a single-element list, and a name. This list sits inside a DataFrame column. The goal is to expand each row’s JSON horizontally, not vertically.

import pandas as pd
readings_blob = [
    {
        "scalar": "43",
        "units": "m",
        "parameter": [{"no_1": "45", "no_2": "1038", "no_3": "356"}],
        "name": "Foo",
    },
    {
        "scalar": "54.1",
        "units": "s",
        "parameter": [{"no_1": "78", "no_2": "103", "no_3": "356"}],
        "name": "Yoo",
    },
    {
        "scalar": "1123.1",
        "units": "Hz",
        "parameter": [{"no_1": "21", "no_2": "43", "no_3": "3577"}],
        "name": "Baz",
    },
]
frame = pd.DataFrame(
    {
        "alpha": [11, 9, 23, 1],
        "beta": [7, 3, 1, 12],
        "data_json": [readings_blob, readings_blob, readings_blob, readings_blob],
    },
    index=[0, 1, 2, 3],
)

A straightforward attempt to normalize after explode duplicates rows because each row’s list has three elements. That produces three rows per original record:

flat_fail = pd.json_normalize(frame["data_json"].explode())

Why it happens

json_normalize operates on sequences of records. After explode, each element of the list is treated as a separate record, so you get one row per element. In other words, it has no notion of the original row boundary unless you explicitly keep it. The inner parameter field is also a list holding a single dictionary, which requires explicit handling if you want to expose those nested keys.

A compact fix that keeps one row per record

Apply json_normalize to each cell of the JSON column and pivot the result into a Series with a two-level index: the first level is name, and the second level holds the other fields. This keeps the transformation per row and avoids row multiplication.

series_wide = frame["data_json"].apply(
    lambda items: pd.json_normalize(items).set_index("name").stack()
)

This produces, for each DataFrame row, a single wide Series with a MultiIndex like (name, field), e.g. ("Foo", "scalar"), ("Foo", "units"), ("Foo", "parameter"), and so on.

Deeper normalization of the inner parameter

If you want to pull keys out of the single-element parameter list, specify record_path and meta in json_normalize. This exposes no_1, no_2, no_3 alongside scalar, units and name.

series_wide_params = frame["data_json"].apply(
    lambda items: pd.json_normalize(
        data=items,
        record_path="parameter",
        meta=["scalar", "units", "name"],
    ).set_index("name").stack()
)

The result is still a per-row Series with a two-level index, but now the parameter keys are flattened into the structure.

Turn the Series into actual columns

To merge the expanded JSON back into the original DataFrame as new columns, concatenate along columns. This keeps one row per original record and appends the normalized values to the right.

out_wide = pd.concat(
    [
        frame.drop(columns="data_json"),
        series_wide,
    ],
    axis="columns",
)

If you prefer to keep an explicit top-level column grouping, first add a new top column level to the non-JSON part and then concatenate. This preserves a MultiIndex across the entire DataFrame.

normalize_fn = lambda items: (
    pd.json_normalize(items, "parameter", ["scalar", "units", "name"])
    .set_index("name")
    .stack()
)
lift_cols = lambda df_obj, level_name: pd.concat([df_obj], keys=[level_name], axis=1)
out_multi = pd.concat(
    [
        frame.drop(columns="data_json").pipe(lift_cols, "Main"),
        frame["data_json"].apply(normalize_fn),
    ],
    axis="columns",
)

In both cases the JSON for each row is expanded horizontally instead of vertically. The latter variant preserves a MultiIndex for clear grouping by name and field.

Why this matters

Working with nested JSON in DataFrames often leads to accidental reshaping. When a column stores a list of dicts, explode plus json_normalize is convenient but destroys the one-record-per-row invariant. Applying normalization per row and stacking under a name-based index preserves row boundaries and lets you choose whether to keep a MultiIndex or project everything into flat columns later.

Takeaways

Normalize the JSON at the row level instead of exploding globally. Use set_index("name").stack() to pivot fields into a tidy, column-friendly structure. When the JSON embeds a single dict inside a list, reach for record_path and meta to expose those inner keys. Finally, concatenate the per-row Series back to your table to keep one row per record while still gaining fully accessible columns.

json pandas python