2025, Sep 23 23:00

Vectorized way to map a 1-based year index to the column header in pandas (no lambda)

Learn how to map a 1-based column position to its header in pandas without apply/lambda. Vectorized indexing, safe handling of NaN/out-of-range, and pitfalls.

Mapping a numeric indicator in one column to a header name is a common pandas task that can look deceptively simple. In a wide dataframe where columns represent years and a separate column stores a 1-based position (year), the goal is to write back the corresponding column label into a new column (year_name). Doing it row by row with a lambda is a typical first attempt, but it produces cryptic errors and unnecessary complexity. The good news: you don’t need a lambda at all.

Problem setup

Assume a dataframe where years are columns and year is a 1-based index pointing to the target column. The itemName column is the index.

          2020  2021  2022  2023  2024  year
itemName                                      
item1        5    20    10    10    50     3
item2       10    10    50    20    40     2
item3       12    35    73    10    54     4

The expected result is to append year_name with the header corresponding to the position stored in year.

          2020  2021  2022  2023  2024  year year_name
itemName                                               
item1        5    20    10    10    50     3      2022
item2       10    10    50    20    40     2      2021
item3       12    35    73    10    54     4      2023

Attempt that triggers errors

A row-wise apply with a lambda often looks like a straightforward solution, but it leads to type and indexing issues. The following snippets preserve the original idea while exposing why they fail.

col_labels = repo[key].columns.tolist()
frame_out[["last_year_name"]] = frame_out[["_last_year"]].apply(
    lambda s: col_labels[s]
)

This fails with a TypeError like “list indices must be integers or slices, not Series.” The lambda receives an entire row slice (a Series), and using a Series as a list index isn’t valid.

col_labels = repo[key].columns.tolist()
frame_out[["last_year_name"]] = frame_out[["_last_year"]].apply(
    lambda s: col_labels[s.iloc[0].astype(int)]
)

This can then run into “IndexError: list index out of range,” and if you assign as a two-dimensional target with double brackets while producing a one-dimensional result, you can also get “ValueError: Columns must be same length as key.”

What actually goes wrong

.apply on a dataframe with axis=0 yields a Series per row, not a scalar suitable for direct list indexing. That’s why col_labels[s] fails: s is a Series. Attempting to coerce by grabbing s.iloc[0] papers over the real issue and risks out-of-range indexing. Additionally, assigning to a single new column using a two-bracket target like frame_out[["last_year_name"]] but providing a 1D Series on the right-hand side can trigger a length mismatch error.

Vectorized fix (no lambda needed)

The columns index is itself indexable. Because year is 1-based, subtract one to align with zero-based positional indexing and pick directly from the columns. This avoids per-row Python calls and all the shape/type pitfalls.

# Direct mapping from positional indicator to column label
grid["year_name"] = grid.columns[grid["year"] - 1]  # .astype("Int64")

This expression selects, for every row, the column label at position year - 1 and writes it into year_name. The optional cast to Int64 can be used if you need a specific nullable integer dtype for the result.

Handling invalid or missing values

If year can contain NaN or out-of-range positions, use a Series with reindex to align safely and get NaN when the position is invalid instead of raising.

grid["year_name"] = pd.Series(grid.columns).reindex(grid["year"] - 1).values

This approach gracefully handles cases like NaN in year or positions beyond the last column, returning NaN in year_name rather than throwing.

Reproducible inputs

Valid values:

grid = pd.DataFrame.from_dict({
    "index": ["item1", "item2", "item3"],
    "columns": [2020, 2021, 2022, 2023, 2024, "year"],
    "data": [
        [5, 20, 10, 10, 50, 3],
        [10, 10, 50, 20, 40, 2],
        [12, 35, 73, 10, 54, 4],
    ],
    "index_names": ["itemName"],
    "column_names": [None],
}, "tight")

Invalid values:

grid = pd.DataFrame.from_dict({
    "index": ["item1", "item2", "item3"],
    "columns": [2020, 2021, 2022, 2023, 2024, "year"],
    "data": [
        [5, 20, 10, 10, 50, 3],
        [10, 10, 50, 20, 40, 20],
        [12, 35, 73, 10, 54, None],
    ],
    "index_names": ["itemName"],
    "column_names": [None],
}, "tight")

Why this matters

Relying on vectorized indexing keeps the operation concise, avoids ambiguous shape and dtype conversions, and prevents errors that come from treating Series as scalars. It also sidesteps common assignment pitfalls, such as mismatched shapes caused by double brackets when targeting a single column. For dataframes with thousands of rows, skipping per-row Python lambdas is cleaner and less error-prone.

Takeaways

Use the columns index directly when you need to map a 1-based position to a column label: grid.columns[grid["year"] - 1]. If your data may contain NaN or positions out of bounds, switch to the reindex pattern to safely produce NaN instead of exceptions. When assigning a single new column, target it with single brackets to avoid “Columns must be same length as key.” With these patterns, the transformation is both robust and readable.

The article is based on a question from StackOverflow by DGMS89 and an answer by mozway.