2026, Jan 01 19:00

Build Correct Nested JSON from Pandas GroupBy: Keep All Receipt_No Values per customer_id Without Overwrites

Convert grouped pandas data to nested JSON without losing values. Avoid overwriting repeated keys by building a policy details list for each customer_id.

Turning flat, row-wise data into a clean, nested structure is a common need when you export data from pandas to JSON. A frequent pitfall is overwriting repeated keys during iteration, which silently drops earlier values. Below is a concise walkthrough of how this happens and how to reshape the data so each customer retains all their Policy_No entries with their respective Receipt_No values.

Problem overview

The source data contains multiple rows per customer_id, where each row represents a policy receipt. After grouping by customer_id, the next step is to build a JSON-like structure. The straightforward approach assigns fields on every loop iteration, which leads to the last Receipt_No overwriting the earlier ones for the same Policy_No. The goal is to collect all policy rows per customer inside a nested list instead of losing them to overwrites.

Buggy pattern in code

The following example shows how the overwrite arises. The logic is kept intact, but the variable names are different for clarity.

import pandas as pd

frame = pd.DataFrame({
    'type': ['customer','customer','customer','customer'],
    'customer_id': ['1-0000001','1-0000001','1-0000002','1-0000002'],
    'u_fn': ['TestUser1_FirstName','TestUser1_FirstName','TestUser2_FirstName','TestUser2_FirstName'],
    'u_ln': ['TestUser1_LastName','TestUser1_LastName','TestUser2_LastName','TestUser2_LastName'],
    'Customer_Type_ID': ['ΦΥΣΙΚΟ','ΦΥΣΙΚΟ','ΦΥΣΙΚΟ','ΦΥΣΙΚΟ'],
    'u_em': ['customer1@otenet.gr','customer1@otenet.gr','customer2@gmail.com','customer2@gmail.com'],
    'u_mb': ['6900000001','6900000001','6900000002','6900000002'],
    '# of policies':[2,2,2,2],
    'Company_ID': [2,2,2,2],
    'Branch_ID': [5,5,5,5],
    'Policy_No': ['000000001','000000001','000000002','000000002'],
    'Receipt_No': ['420000001','420000002','430000001','430000002']
})

by_client = frame.groupby('customer_id').apply(
    lambda grp: grp[['u_fn','u_ln','Customer_Type_ID','u_em','u_mb',
                     'Company_ID','Branch_ID','Policy_No','Receipt_No','# of policies']]
    .to_dict('records'),
    include_groups=False
).reset_index()

by_client.columns = ['customer_id', 'attributes']

payload = by_client.to_dict('records')

assembled = []

for rec in payload:
    attrs = {}
    for entry in rec["attributes"]:
        attrs["u_fn"] = entry["u_fn"]
        attrs["u_ln"] = entry["u_ln"]
        attrs["Customer_Type_ID"] = entry["Customer_Type_ID"]
        attrs["u_em"] = entry["u_em"]
        attrs["u_mb"] = entry["u_mb"]
        attrs["# of policies"] = entry["# of policies"]
        attrs["Company_ID"] = entry["Company_ID"]
        attrs["Branch_ID"] = entry["Branch_ID"]
        attrs["Policy_No"] = entry["Policy_No"]
        attrs["Receipt_No"] = entry["Receipt_No"]

    elem = {
        "Id": rec["customer_id"],
        "attributes": attrs
    }
    assembled.append(elem)

Because attrs["Receipt_No"] and related fields are reassigned on each pass, the final object only contains the last values for each customer.

What actually goes wrong

Each row under a customer describes a distinct policy receipt event. Reassigning scalar keys like Receipt_No under a single attributes dictionary collapses all policy receipts into one, keeping only the last one iterated. The correct structure needs a list that accumulates per-policy details rather than a one-to-one overwrite.

Working approach: build a base and a nested policy list

Starting from a grouped structure, it is cleaner to separate stable customer fields from per-policy fields, then attach the latter as a nested list. The following code begins with a contacts-like list and constructs the desired result by explicitly choosing which keys belong to the top-level attributes and which belong to policy entries.

import json

roster = [
  {
    "customer_id": "1-0000001",
    "attributes": [
      {"u_fn": "TestUser1_FirstName", "u_ln": "TestUser1_LastName", "Customer_Type_ID": "ΦΥΣΙΚΟ", "u_em": "customer1@otenet.gr", "u_mb": "6900000001", "Company_ID": 2, "Branch_ID": 5, "Policy_No": "000000001", "Receipt_No": "420000001", "# of policies": 2},
      {"u_fn": "TestUser1_FirstName", "u_ln": "TestUser1_LastName", "Customer_Type_ID": "ΦΥΣΙΚΟ", "u_em": "customer1@otenet.gr", "u_mb": "6900000001", "Company_ID": 2, "Branch_ID": 5, "Policy_No": "000000001", "Receipt_No": "420000002", "# of policies": 2}
    ]
  },
  {
    "customer_id": "1-0000002",
    "attributes": [
      {"u_fn": "TestUser2_FirstName", "u_ln": "TestUser2_LastName", "Customer_Type_ID": "ΦΥΣΙΚΟ", "u_em": "customer2@gmail.com", "u_mb": "6900000002", "Company_ID": 2, "Branch_ID": 5, "Policy_No": "000000002", "Receipt_No": "430000001", "# of policies": 2},
      {"u_fn": "TestUser2_FirstName", "u_ln": "TestUser2_LastName", "Customer_Type_ID": "ΦΥΣΙΚΟ", "u_em": "customer2@gmail.com", "u_mb": "6900000002", "Company_ID": 2, "Branch_ID": 5, "Policy_No": "000000002", "Receipt_No": "430000002", "# of policies": 2}
    ]
  }
]

base_keys = ["u_fn", "u_ln", "Customer_Type_ID", "u_em", "u_mb", "# of policies"]
detail_keys = ["Company_ID", "Branch_ID", "Policy_No", "Receipt_No"]

assembled = []
for entry in roster:
    envelope = {"customer_id": entry["customer_id"]}
    head_map = {k: entry["attributes"][0][k] for k in base_keys}
    detail_rows = [
        {k: row[k] for k in detail_keys}
        for row in entry["attributes"]
    ]
    head_map = head_map | {"policy_details": detail_rows}
    envelope = envelope | {"attributes": head_map}
    assembled.append(envelope)

print(json.dumps(assembled, indent=2, ensure_ascii=False))

This produces a structure where attributes holds the stable customer fields plus a policy_details list, and each element of that list captures Company_ID, Branch_ID, Policy_No, and the corresponding Receipt_No without any loss from overwriting.

Why this matters

APIs and downstream systems often expect nested JSON where one entity aggregates multiple related records. A flat-to-nested reshape is trivial to get almost right, but incorrect handling of repeated keys leads to subtle data loss. Preserving every Receipt_No tied to a Policy_No per customer is essential for accurate analytics and integrations.

Practical takeaways

When transforming grouped data to JSON, split stable fields from repeating ones and materialize the repeating data as a list. Avoid assigning the same key repeatedly inside a single dictionary when those values represent multiple records. Build a base dictionary once, construct a list for the per-row pieces, and merge them in a final object.

This pattern keeps the code readable, matches the intended hierarchical shape, and prevents accidental overwrites of critical values like Receipt_No.

nested-for-loop python