2025, Nov 22 07:00

Understanding Python’s Key-Sharing Dictionaries, sys.getsizeof, and the 3.12 Memory Increase

Learn how Python’s key-sharing dictionaries impact sys.getsizeof, why __dict__ size ignores value lengths, and what changed in Python 3.12 vs 3.10: PyDictValues.

Python’s Key-Sharing dictionaries keep memory usage of many similar objects in check by splitting metadata from per-instance data. But when you measure such objects with sys.getsizeof, the numbers may look counterintuitive. Let’s unpack what exactly is being measured, why changing values does not affect the reported size, and why Python 3.12 shows a larger baseline than 3.10.

Reproducing the question

The setup is straightforward: an instance with several attributes and a size measurement on its __dict__:

import sys
class Record:
    def __init__(self, x0, x1, x2, x3, x4):
        self.f0 = x0
        self.f1 = x1
        self.f2 = x2
        self.f3 = x3
        self.f4 = x4
scheme = Record('blue', 'orange', 'green', 'yellow', 'red')
sys.getsizeof(vars(scheme))  # e.g., 296 on Python 3.12.5

On Python 3.12.5 this returns 296 bytes for the object’s attribute dictionary, while Python 3.10.11 returns 104. Changing values to very long strings or replacing an attribute name with a very long identifier still leaves the reported size unchanged.

What Key-Sharing dicts actually store

A Key-Sharing dictionary consists of two parts: a shared keys table held at the class level, and a per-instance value array. All instances of the same class share the same keys table. Each instance then stores the actual values in a value array that corresponds to those shared keys.

In CPython this is implemented inside PyDictObject. The internal field ma_values either is NULL or points to the array used for split tables. The important detail for Python users is that the attribute mapping you see as obj.__dict__ is still a dict in terms of Python’s type system. The key-sharing machinery is an internal layout optimization, not a different Python-visible type.

What sys.getsizeof really measures here

sys.getsizeof reports the memory footprint of the object you pass to it. When you call it on vars(obj) or obj.__dict__, you are measuring the dictionary object itself, not the objects it references. The dictionary holds pointers to the attribute values, and the size of those pointers is constant regardless of the size of the actual values.

This explains two observations at once. Making a value extremely large does not change the reported size because only the pointer to that value lives in the dictionary. Making an attribute name extremely long does not change the reported size either, because the attribute names are part of the shared keys table, not the per-instance dictionary you are measuring.

Integers behave the same way in this context. In Python, integers are objects, and the dictionary stores pointers to them. The pointer size is constant regardless of what it points to.

Why Python 3.12 reports a larger size than 3.10

The size difference is caused by changes in the internal representation of dictionaries. Python 3.12 introduced a new member type for key-sharing dictionaries. Previously, ma_values was a PyObject**; it is now a PyDictValues*, which increased the base size of dictionaries to accommodate the new structure. The change is documented in CPython’s sources and commit history.

typedef struct _dictkeysobject PyDictKeysObject;
+++ typedef struct _dictvalues PyDictValues;
/* The ma_values pointer is NULL for a combined table
 * or points to an array of PyObject* for a split table
 */
typedef struct {
    PyObject_HEAD
    /* Number of items in the dictionary */
    Py_ssize_t ma_used;
    /* Dictionary version: globally unique, value change each time
       the dictionary is modified */
    uint64_t ma_version_tag;
    PyDictKeysObject *ma_keys;
    /* If ma_values is NULL, the table is "combined": keys and values
       are stored in ma_keys.
       If ma_values is not NULL, the table is splitted:
       keys are stored in ma_keys and values are stored in ma_values */
---    PyObject **ma_values;
+++    PyDictValues *ma_values;
} PyDictObject;

and

struct _dictvalues {
    uint64_t mv_order;
    PyObject *values[1];
};

Because obj.__dict__ is still a PyDictObject under the hood, sys.getsizeof sees and measures a dict. The key sharing and the split layout are internal details of that dict.

Putting the understanding to use

There’s nothing to “fix” in code when sys.getsizeof returns a number that doesn’t reflect the total memory of values referenced by the dictionary. The function does exactly what it promises: it reports the footprint of the object itself. For an instance dictionary, that means the container plus its internal fields, not the transitive closure of all objects it points to.

If you replace a value like 'blue' with 'b' * 10000 and get the same size, it’s because the dictionary holds a pointer, not the string. If you rename an attribute to an extremely long identifier and get the same size, it’s because the keys live in the shared keys table, which is not part of the instance’s dictionary object.

Why this matters

When you inspect memory of Python objects in production or during performance work, reading sys.getsizeof at face value can be misleading if you expect it to include the sizes of referenced objects. For Key-Sharing dicts in particular, it’s essential to remember that the shared keys are stored at the class level, while the instance dictionary you measure contains pointers to values. Version-to-version differences can also come from internal representation changes like the introduction of PyDictValues in Python 3.12.

Conclusion

Key-Sharing dictionaries split shared keys from per-instance values, and sys.getsizeof reflects only the memory used by the dictionary object representing that per-instance mapping. The size won’t change with larger strings or different value types because pointers have constant size, and the keys are not part of the instance dictionary. Larger baselines in Python 3.12 compared to 3.10 stem from internal changes to the dictionary layout, specifically the move to PyDictValues for split tables. Keep these constraints in mind when interpreting memory measurements and comparing results across Python versions.