2025, Dec 02 13:00

Fix pandas DataFrame pickle errors across machines: avoid to_pickle/HIGHEST_PROTOCOL, use pickle.dump default protocol

ModuleNotFoundError: numpy._core.numeric when unpickling a pandas DataFrame across machines? See how pickle.dump with default protocol fixes pickle issues.

Sharing a pandas DataFrame between machines through pickle sounds trivial until an unexpected import error stops the workflow. A DataFrame pickled on one system loads fine locally, yet on another machine it can fail with a ModuleNotFoundError pointing at numpy._core.numeric, even when pandas and numpy versions match. The root of the issue lies not in the loader but in how the object was serialized in the first place.

Reproducing the failure

The DataFrame was serialized on machine A using the built-in method, and then deserialized on machine B via both a plain pickle loader and pandas’ convenience wrapper. The error appears consistently across both loading approaches.

frame.to_pickle('bundle.pkl')
with open('path/to/bundle.pkl','rb') as fh:
restored_obj = pickle.load(fh)
restored_obj = pd.read_pickle('path/to/bundle.pkl')
ModuleNotFoundError: No module named 'numpy._core.numeric'

Both environments ran pandas 2.2.2 and numpy 1.26.4. Aligning versions helped some users elsewhere, but not in this scenario.

What’s actually going wrong

The behavior depends on the serialization method and the chosen pickle protocol. Objects saved via pandas’ built-in serialization path or dumped with the highest available pickle protocol later produced the same import error when read back, regardless of whether they were loaded with pickle.load or pd.read_pickle. In contrast, using a straightforward pickle.dump with its default protocol avoided the issue.

A working approach

Serializing the DataFrame with pickle.dump and the default protocol makes the artifact portable across machines. Deserialization then works through either the standard pickle loader or pandas’ read_pickle.

with open('safe_dump.pkl','wb') as fh:
pickle.dump(frame, fh)
with open('path/to/safe_dump.pkl','rb') as fh:
revived = pickle.load(fh)
revived = pd.read_pickle('path/to/safe_dump.pkl')

However, forcing the highest protocol during dumping reproduces the failure during load, mirroring the earlier error.

with open('safe_dump.pkl','wb') as fh:
pickle.dump(frame, fh, protocol=pickle.HIGHEST_PROTOCOL)

Why this nuance matters

Data artifacts often move between notebooks, CI jobs, and different machines. When a simple persistence step becomes environment-sensitive, it introduces silent fragility into pipelines. Here, the loader choice wasn’t the differentiator; the serialization path and protocol were. Remembering that detail prevents hard-to-diagnose runtime breaks in downstream tasks that expect drop-in compatibility.

Practical takeaways

If you encounter ModuleNotFoundError: No module named 'numpy._core.numeric' when unpickling a pandas DataFrame across machines, prefer creating the artifact with pickle.dump using the default protocol. Avoid forcing pickle.HIGHEST_PROTOCOL for this use case. If you previously serialized the object using DataFrame.to_pickle and see the failure, re-serialize it with pickle.dump and retest. Version parity alone might not resolve the issue, so the serialization path is the lever that matters here. A quick round-trip test on a second machine before integrating the artifact into a workflow can save time and confusion.