2026, Jan 10 11:00

How to serialize nested Python dicts with NumPy arrays to JSON using a custom JSONEncoder (no pre-walking, no TypeError)

Learn to serialize deeply nested Python structures with NumPy arrays to JSON by subclassing json.JSONEncoder. Avoid TypeError, handle NaN, and keep data intact.

Saving a deeply nested structure that mixes standard Python types with NumPy arrays to JSON looks trivial until json.dump hits an ndarray. The default encoder has no idea what to do with it and fails early, long before any custom conversion logic can help. Here is how to wire the serialization correctly and avoid the TypeError without flattening your data model or pre-walking the tree manually.

Problem setup

The data container combines defaultdict layers and standard types with NumPy arrays and special numeric values. A custom recursive converter is available to normalize NumPy types and convert np.nan to JSON-compatible values, but it never runs because the naive json.dumps call crashes first.

import numpy as np
import json
from collections import defaultdict
# Converter for NumPy types and NaNs
def convert_for_json(obj):
    if isinstance(obj, np.ndarray):
        return obj.tolist()
    if isinstance(obj, defaultdict):
        return {k: convert_for_json(v) for k, v in obj.items()}
    if isinstance(obj, dict):
        return {k: convert_for_json(v) for k, v in obj.items()}
    if isinstance(obj, list):
        return [convert_for_json(i) for i in obj]
    if isinstance(obj, (int, float, bool, complex, str, type(None))):
        return obj
    if isinstance(obj, np.integer):
        return int(obj)
    if isinstance(obj, np.floating):
        return float(obj) if not np.isnan(obj) else None
    if isinstance(obj, np.complexfloating):
        return {"real": float(obj.real), "imag": float(obj.imag)}
    if isinstance(obj, np.bool_):
        return bool(obj)
    if isinstance(obj, (np.void)):
        return None
    return obj
# Example nested structure (simplified)
results_blob = defaultdict(lambda: defaultdict(lambda: defaultdict(lambda: defaultdict(dict))))
results_blob['category_A']['size_X']['config_1']['setting_alpha']['parameter_grid'] = np.linspace(0, 1, 5)
results_blob['category_A']['size_X']['config_1']['setting_alpha']['metric_array_1'] = np.array([0.1, 0.2, 0.5, 0.8, 1.0])
results_blob['category_A']['size_X']['config_1']['setting_alpha']['num_valid_entries'] = 5
results_blob['category_A']['size_X']['config_1']['setting_alpha']['distribution_data'] = {
    '0.5': {
        'size_values': [1, 2, 3],
        'normalized_counts': np.array([0.5, 0.3, 0.2]),
        'source_count': 5
    }
}
# Failing attempt
output_path = "output_data.json"
try:
    # This tries to serialize the raw structure, hitting NumPy arrays immediately
    mirror = json.loads(json.dumps(results_blob))  # TypeError: ndarray is not JSON serializable
    normalized = convert_for_json(mirror)
    with open(output_path, "w") as fh:
        json.dump(normalized, fh, indent=2, allow_nan=False)
    print(f"Data saved to: {output_path}")
except TypeError as err:
    print(f"ERROR during JSON processing: {err}")
except Exception as ex:
    print(f"ERROR saving file: {ex}")

Why it breaks

The failure happens at json.dumps(results_blob). The default JSON encoder encounters a NumPy ndarray inside the nested defaultdict before any normalization occurs. The intermediate json.loads(json.dumps(...)) step never completes, so the converter is never applied. In short, attempting to use the stock encoder on a structure that still contains arrays is what triggers the TypeError.

The fix: use a custom JSONEncoder

json.dump and json.dumps accept a cls argument that points to a subclass of json.JSONEncoder. Overriding its default method tells the encoder how to serialize otherwise unsupported objects. In this case, mapping ndarray to list is enough to unblock the traversal across the entire nested mapping. The encoder will walk the structure and call default whenever it meets a non-standard type, so there is no need to pre-convert the whole object graph.

import numpy as np
import json
from collections import defaultdict
class NumpyAwareEncoder(json.JSONEncoder):
    def default(self, value):
        if isinstance(value, np.ndarray):
            return value.tolist()
        return super().default(value)
nested_store = defaultdict(lambda: defaultdict(lambda: defaultdict(lambda: defaultdict(dict))))
nested_store['category_A']['size_X']['config_1']['setting_alpha']['parameter_grid'] = np.linspace(0, 1, 5)
nested_store['category_A']['size_X']['config_1']['setting_alpha']['metric_array_1'] = np.array([0.1, 0.2, 0.5, 0.8, 1.0])
nested_store['category_A']['size_X']['config_1']['setting_alpha']['num_valid_entries'] = 5
nested_store['category_A']['size_X']['config_1']['setting_alpha']['distribution_data'] = {
    '0.5': {
        'size_values': [1, 2, 3],
        'normalized_counts': np.array([0.5, 0.3, 0.2]),
        'source_count': 5
    }
}
with open('output.json', 'w', encoding='utf8') as fh:
    json.dump(nested_store, fh, indent=2, cls=NumpyAwareEncoder)

This approach cleanly serializes nested defaultdicts as regular JSON objects and converts arrays into lists inline. If NaN handling matters to a downstream consumer, json.dump has an allow_nan parameter that defaults to True. If the consumer requires strict JSON and does not accept NaN, set allow_nan to False and add the necessary support in the JSONEncoder.

Why this is worth knowing

Deeply nested structures are common in simulation pipelines and experiment tracking. Trying to pre-walk such trees to coerce every leaf is error-prone and expensive. Delegating the conversion to a json.JSONEncoder centralizes the special cases, keeps your data structures intact, and makes the serialization path predictable. It also avoids redundant transformations like a premature json.dumps/json.loads round trip that fails as soon as it encounters an unsupported type.

Conclusion

When JSON serialization meets NumPy arrays, always teach the encoder how to handle them instead of massaging the container first. A small subclass of json.JSONEncoder that returns obj.tolist() for ndarray is enough to let json.dump traverse the entire defaultdict hierarchy safely. If you need strict JSON around NaN, control it with allow_nan and extend the encoder accordingly. This keeps both the code and the data model clean while preventing TypeError surprises during export.