2025, Dec 08 13:00

Pandas HDF5 table mode crash: IndexError from a DataFrame column named 'index' and how to fix it

Why pandas HDF5 table mode writes fail with IndexError when a DataFrame has a column named 'index', and how renaming it (e.g., to Index) fixes serialization.

When pandas meets HDF5 in table mode and stumbles on a column named “index”

Serializing a pandas DataFrame to HDF5 with table format is a routine task, until it suddenly isn’t. A seemingly ordinary write with HDFStore.put can fail with an unexpected IndexError, even when the frame is clean and itemsize hints are in place. The trigger is subtle: a single column literally named index.

Repro: writing a DataFrame in table mode

The write path looks straightforward. A DataFrame is persisted via put with format set to table, data_columns enabled, and min_itemsize tuned for string storage.

import pandas as pd
# Example DataFrame that includes a column literally named 'index'
meta_frame = pd.DataFrame({
    "index": [0, 1, 2],
    "xPosition": [1.0, 2.0, 3.0],
    "yPosition": [4.0, 5.0, 6.0],
    "approachID": [10, 11, 12]
})
item_key = "some_node"
with pd.HDFStore("example.h5") as h5store:
    h5store.put(item_key, meta_frame, format="table", data_columns=True, min_itemsize={"values": 100})

In this setup, the write can crash with a tuple index out of range error. The traceback points into pandas’ HDF5 writer internals.

IndexError: tuple index out of range

File ".../site-packages/pandas/io/pytables.py", line 4473, in write_data
new_shape = (nrows,) + self.dtype[names[nindexes + i]].shape

What actually breaks

The failure is triggered by the presence of a column named index. Keeping that exact name is enough to reproduce the error. Renaming columns to anything else—such as numeric strings or a capitalized variant—avoids the issue.

Fix: rename the problematic column

Renaming index to a different label resolves the write reliably. The simplest non-breaking change is to capitalize it.

import pandas as pd
meta_frame = pd.DataFrame({
    "index": [0, 1, 2],
    "xPosition": [1.0, 2.0, 3.0],
    "yPosition": [4.0, 5.0, 6.0],
    "approachID": [10, 11, 12]
})
# Rename only the conflicting column
safe_frame = meta_frame.rename(columns={"index": "Index"})
item_key = "some_node"
with pd.HDFStore("example.h5") as h5store:
    h5store.put(item_key, safe_frame, format="table", data_columns=True, min_itemsize={"values": 100})

If you prefer, any alternative name that is not the exact string index also works. For example, auto-numbered or other descriptive labels are fine.

Why this matters for data pipelines

HDF5 table mode is often used in long-running or automated jobs. Hitting an IndexError deep in the writer can derail a pipeline, especially when the dataset shape and dtypes look normal. Knowing that a single column name can be the culprit saves time during triage and keeps storage code stable across datasets that come from different sources.

Takeaways

If a pandas HDF5 write in table mode fails with “IndexError: tuple index out of range” and the frame otherwise looks valid, check for a column literally named index. Renaming it, for instance to Index, prevents the crash while leaving the data and write options intact, including format="table", data_columns=True, and min_itemsize settings.