2025, Oct 18 15:00

Creating a NumPy object array from a Python list without data duplication: copy=False pitfalls and safe options

Learn how to build a NumPy dtype=object array from a Python list, why np.asarray(copy=False) raises ValueError, and the safe zero-duplication approach to use instead.

Creating a NumPy ndarray from a Python list without duplicating data sounds straightforward until dtype=object enters the picture. Arrays with dtype=object hold references to Python objects rather than raw, contiguous values. That complicates expectations around zero-copy conversion, especially when reaching for np.asarray(..., copy=False) and getting a hard failure. Let’s unpack what actually happens and what you can rely on.

Problem statement

The goal is to turn a Python list into a NumPy array of dtype=object without creating a copy on construction. Attempting to force that via copy=False raises an error:

import numpy as np

items = ['spam', 'eggs']
obj_view = np.asarray(items, dtype='object', copy=False)  # ValueError

This happens even though dtype=object arrays store references, and tobytes() on such arrays yields pointer bytes rather than the objects' contents. The question is whether ndarray can share the list's existing storage directly.

How NumPy treats lists with and without object dtype

Start with a simple sequence and convert it twice: once letting NumPy pick the dtype, and once forcing dtype=object. Without an explicit dtype, NumPy produces a fixed-width string array, copying string data into a compact representation:

import numpy as np

words = ['one', 'two', 'three']
fixed_str = np.asarray(words)  # dtype becomes something like '<U5'

That array stores the strings as fixed-length Unicode, not as Python objects. It’s a copy of the string data. In this example, that fits in 3*5*4=60 bytes.

With dtype=object, NumPy stores references to the original Python objects. That results in a shallow copy of the container (the array owns its own pointer block), but the elements themselves are the very same Python objects:

obj_arr = np.asarray(words, dtype=object)

# Same object identity for the third element
id(words[2])
id(obj_arr[2])

The identities match, demonstrating that the array holds references to the original strings.

Introduce a mutable element and the behavior becomes more visible. Mutating the shared object through either container is reflected in both:

mixed = ['one', 'two', 'three', ['a', 'b']]
obj_box = np.array(mixed, object)

# Mutate the nested list through the array reference
obj_box[3].append('c')

# The change is visible in both containers because they refer to the same nested list
mixed
obj_box

However, replacing a top-level element in the original list does not change what’s stored in the object array, because the array's pointer block is separate:

mixed[1] = 12.3
mixed
obj_box

Why copy=False fails here

Trying to force NumPy to avoid even the shallow copy of the pointer block produces a clear error. The intent of copy=False is to guarantee that no copy happens; when NumPy can’t honor that, it refuses:

ValueError: Unable to avoid copy while creating an array as requested. If using np.array(obj, copy=False) replace it with np.asarray(obj) to allow a copy when needed (no behavior change in NumPy 1.x). For more details, see the migration guide.

The practical upshot is that copy=None (the default) is just as useful here. It copies only if needed. Using copy=True forces a copy of the container, but for dtype=object that is still shallow; the Python objects themselves are not duplicated. A true deep copy of the referenced Python objects would require something like copy.deepcopy.

Can ndarray share the list’s internal storage?

Directly reusing a Python list’s internal pointer buffer for a NumPy object array would require relying on interpreter internals. While both a Python list and a NumPy object array, in practice, back their elements with contiguous pointers, creating an ndarray on top of the list's buffer is implementation dependent and unsafe. The list type does not expose its internal buffer, so accomplishing this would require hacky workarounds that depend on details like pointer size, endianness, and the memory layout of CPython objects, and it would be brittle in the face of resizes or runtime changes.

Working solution

The supported approach is to create an object array that references the original Python objects, accepting that the array’s pointer block is its own. That achieves the no-duplicate-object behavior most people want, even if the container itself is separate.

import numpy as np

data_src = ['one', 'two', 'three', ['a', 'b']]
ref_array = np.asarray(data_src, dtype=object)  # copy=None by default; shallow copy of references

# Prove referential sharing via mutation of the nested list
ref_array[3].append('c')
# data_src and ref_array now reflect the same mutated nested list

# Replacing a top-level element in the list does not affect the array's slot
data_src[1] = 12.3

Why this matters

Object arrays and Python lists look similar because both hold references to Python objects, but they behave differently. Lists can append; arrays can reshape. With dtype=object you rarely gain speed; at most, some operations become syntactically convenient. When you need vectorized performance, prefer native numeric/string dtypes. When you need Python-level mutability and heterogeneity, a list is often a better fit. Knowing that object arrays are shallow with respect to the elements helps prevent accidental shared-state bugs and clarifies what copy flags can and cannot guarantee.

Takeaways

If you need a NumPy array that references existing Python objects, construct it with dtype=object and rely on the default copy behavior. Expect a shallow copy of the container and shared references to the elements. Forcing copy=False will raise an error when NumPy cannot avoid allocating its own pointer block. For deep copies of the actual Python objects, use a tool designed for that, such as copy.deepcopy. And if you were hoping for a true zero-copy view directly over a Python list’s storage, that’s not a supported or robust path.

The article is based on a question from StackOverflow by Dryden and an answer by hpaulj.