2025, Oct 15 16:00

How to resolve CuPy 'unhashable ndarray' errors when storing tuples in sets (convert scalars or use tolist())

Learn why CuPy, unlike NumPy, raises 'unhashable ndarray' when hashing nested tuples for sets, and how to fix it with tolist() or float casting with code.

Converting NumPy code to CuPy is usually straightforward, but sometimes small type differences surface in surprising places. One common pitfall appears when you try to hash data derived from CuPy arrays. If you're feeding a set with nested tuples built from CuPy ndarrays, you may hit a TypeError complaining about unhashable types.

Problem overview

The goal is to reorder columns of a CuPy matrix and store the result as a nested tuple inside a Python set. The straightforward approach that works with NumPy triggers an error with CuPy:

import cupy as cp

def reorder_cols(arr: cp.ndarray):
    col_idx = cp.lexsort(arr[:, 1:])
    arr[:, 1:] = arr[:, 1:][:, col_idx]
    return arr

gpu_mat = cp.array([
    [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.],
    [ 0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0., -1.]
])

seen_patterns = set()
seen_patterns.add(tuple(tuple(row) for row in reorder_cols(gpu_mat)))

This raises:

TypeError: unhashable type: 'ndarray'

What's really going on

A nested tuple is hashable only if all its elements are hashable. With NumPy, iterating over a row often yields Python floats in this pattern, which are hashable. With CuPy, those elements come out as CuPy objects rather than Python floats inside the tuple comprehension, so you end up with unhashable entries inside your tuple. That mismatch is enough to break set insertion.

Fix and working example

The minimal fix is to convert CuPy scalars to Python floats before assembling the tuples, or to materialize each row as a Python list first. Both approaches produce hashable content. Here is a compact version that avoids the inner generator by using tolist(), which is slightly faster and more direct:

import cupy as cp

def reorder_cols(arr: cp.ndarray):
    col_idx = cp.lexsort(arr[:, 1:])
    arr[:, 1:] = arr[:, 1:][:, col_idx]
    return arr

gpu_mat = cp.array([
    [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.],
    [ 0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0., -1.]
])

seen_patterns = set()
seen_patterns.add(tuple(tuple(r.tolist()) for r in reorder_cols(gpu_mat)))

An equivalent variant is to cast each element explicitly with float(x) inside the nested tuple construction.

Why this matters

Porting from NumPy to CuPy isn't always a search-and-replace exercise. Code that relies on implicit conversion to Python scalars can behave differently when the underlying library yields objects that aren't hashable. This becomes critical whenever you place derived values into sets or use them as dictionary keys. Being explicit about scalar conversion keeps your hashing logic predictable and avoids runtime surprises.

Takeaways

If you're building hashable structures from CuPy arrays, ensure that the elements are Python scalars or other hashable types before inserting them into sets. Converting rows via tolist() or casting elements to float solves the TypeError without changing your algorithm. Keeping type hints aligned with CuPy, as in def reorder_cols(arr: cp.ndarray), helps make the intent clear and prevents confusion as you migrate array operations to the GPU.

The article is based on a question from StackOverflow by EzTheBoss 2 and an answer by EzTheBoss 2.