2025, Dec 25 13:00

np.genfromtxt usecols vs names in NumPy: why reordering two columns mislabels data and how to fix it

Learn why NumPy np.genfromtxt mislabels fields when usecols reorders two columns, how dtype names stay out of sync, and the reliable ways to load or reorder data

Loading delimited text into NumPy with np.genfromtxt feels straightforward until you try to reorder columns by name. In a two-column file, simply flipping the order in usecols produces values mapped to the wrong field names, even though the dtype looks identical. The trap is subtle because it disappears as soon as you load a subset from a wider table.

Reproducing the mismatch

The data has two columns, A and B. The first call loads them in file order and behaves as expected. The second call flips the order in usecols and the values no longer align with the field names you expect to see.

from io import StringIO
import numpy as np
src_txt = "A B\na b\na b"
arr_ok = np.genfromtxt(StringIO(src_txt), usecols=["A", "B"], names=True, dtype=None)
print(arr_ok["A"], arr_ok["B"])  # ['a' 'a'] ['b' 'b']
arr_flip = np.genfromtxt(StringIO(src_txt), usecols=["B", "A"], names=True, dtype=None)
print(arr_flip["A"], arr_flip["B"])  # ['b' 'b'] ['a' 'a']  (unexpected)
print(arr_ok.dtype)   # [('A', '<U1'), ('B', '<U1')]
print(arr_flip.dtype) # [('A', '<U1'), ('B', '<U1')]

Both dtypes report the same field names and formats, but the second result maps those names to the swapped data. Field A carries values from the original B column.

Why this happens

The behavior comes from how np.genfromtxt applies usecols together with names. It selects data columns in the order you request, but the dtype names may come from the header without being reordered in some cases. The relevant logic looks like this:

nbcols = len(usecols or first_values)
...
        if usecols:
            for (i, current) in enumerate(usecols):
                # if usecols is a list of names, convert to a list of indices
                if _is_string_like(current):
                    usecols[i] = names.index(current)
                elif current < 0:
                    usecols[i] = current + len(first_values)
            # If the dtype is not None, make sure we update it
            if (dtype is not None) and (len(dtype) > nbcols):
                descr = dtype.descr
                dtype = np.dtype([descr[_] for _ in usecols])
                names = list(dtype.names)
            # If `names` is not None, update the names
            elif (names is not None) and (len(names) > nbcols):
                names = [names[_] for _ in usecols]

When you reorder without reducing the number of columns, len(names) is not greater than the number of used columns, so the names are left as-is. The data follows your usecols order, while the dtype field names keep the header order. That is the mismatch you see in the two-column example.

Why a third column hides the bugbear

As soon as the file has more columns and you load a subset, the logic above updates the names to match usecols. The same reordering now yields correct results because len(names) is greater than the number of used columns.

src_wide = "A B C\na b c\na b c"
sub_ok = np.genfromtxt(StringIO(src_wide), usecols=["A", "B"], names=True, dtype=None)
print(sub_ok["A"], sub_ok["B"])  # ['a' 'a'] ['b' 'b']
sub_flip = np.genfromtxt(StringIO(src_wide), usecols=["B", "A"], names=True, dtype=None)
print(sub_flip["A"], sub_flip["B"])  # ['a' 'a'] ['b' 'b']  (now correct)

In the subset scenario, genfromtxt pays attention to usecols when constructing the dtype, so names and data stay aligned.

Working solutions

If the goal is to avoid surprises when columns are reordered, do not rely on names being remapped for the full set of columns. Either load and reorder afterwards, or make the dtype explicit.

The first approach is to load by header once and then select fields in the order you need. With structured arrays, you can index by a list of field names and NumPy will return a view with fields reordered.

from io import StringIO
import numpy as np
src_txt = "A B\na b\na b"
doc = np.genfromtxt(StringIO(src_txt), names=True, dtype=None)
print(doc['B'], doc['A'])  # access fields by name without relying on column order
rearranged = doc[["B", "A"]]
print(rearranged)  # dtype now has names in ['B', 'A'] order, values aligned

The second approach is to pass an explicit dtype that encodes the names and their desired order, and select columns by index. This aligns the field names with the selected data consistently.

from io import StringIO
import numpy as np
src_txt = "A B\na b\na b"
schema = [("B", "U1"), ("A", "U1")]
locked = np.genfromtxt(StringIO(src_txt), dtype=schema, usecols=[1, 0], skip_header=1)
print(locked["A"], locked["B"])  # ['a' 'a'] ['b' 'b']

In both patterns the values line up with the field names you intend to use.

Why it’s worth knowing

This is an easy edge case to trip over because the dtype prints as expected while the data silently maps to the wrong field names when you reorder without subsetting. The intended role of usecols is often to avoid loading unneeded columns, and field order in structured arrays usually isn’t crucial, which may be why this corner case persists. If the mismatch matters to downstream processing, it can be a source of subtle, long-lived bugs.

Summary and advice

When using np.genfromtxt with names=True, reordering all columns with usecols does not remap the dtype names, so names may no longer describe the data you loaded. If you need a specific order, either load the table and then reorder fields by name, or pass an explicit dtype together with numeric column indices. If you encounter unexpected behavior beyond this, consider reducing it to a minimal example like the one above and treating it as a candidate for an issue report. Keeping field access by name, rather than relying on order, will generally keep your code robust.