2025, Nov 29 21:00

How to use pandas DataFrame.apply with axis=1 for row-wise operations: avoid KeyError, IndexingError, and NaN, compute per-row max correctly

Learn how pandas apply works row-wise with axis=1 to compute per-row max across columns. Fix KeyError, IndexingError, and NaN issues with examples and tips.

Working row-wise in pandas looks deceptively simple until you run into the default mechanics of apply(). A common case is computing a per-row maximum across several date columns. If the lambda receives the wrong object or you index it incorrectly, you get KeyError, IndexingError, or a column of NaN that hides what really happened.

Reproducing the issue

The following snippet sets up a minimal DataFrame with date columns and shows several apply() patterns that fail for different reasons, even though taking the maximum for a single row works.

import pandas as pds
import datetime as dtime
tbl = pds.DataFrame(
   [ [
      dtime.date(2025, 6, 5), dtime.date(2025, 6, 6) ],[
      dtime.date(2025, 6, 7), dtime.date(2025, 6, 8) ]
   ],
   columns=["A", "B"], index=["Row1", "Row2"]
)
# Explicitly find maximum of row 0 (WORKS)
max(tbl.loc[tbl.index[0], ["A", "B"]])
# None of the following 3 code patterns work for "apply"
if False:
   tbl["MaxDate"] = tbl.apply(lambda rec: max(
      rec.loc[rec.index[0], ["A", "B"]]
   ))
   # IndexingError: "Too many indexers"
elif False:
   tbl["MaxDate"] = tbl.apply(lambda rec: max(
      rec["A", "B"]
   ))
   # KeyError:
   # "key of type tuple not found and not a MultiIndex"
elif False:
   tbl["MaxDate"] = tbl.apply(lambda rec: max(
      rec["A"], rec["B"]
   ))
   # KeyError: 'A'
# Querying class of the object provided to the lambda yields a column of NaN
if False:
   tbl["MaxDate"] = tbl.apply(lambda rec: type(rec))

What actually goes wrong

The core is how apply() constructs its inputs and aligns outputs.

By default, axis=0, so apply() iterates over columns. The object passed into the lambda in that mode is a Series that represents a full column, with its index equal to the DataFrame row labels. That explains the KeyError when trying rec["A"] in the default mode: there is no such key in a column-Series indexed by row labels. Likewise, rec.loc[rec.index[0], ["A", "B"]] is invalid because rec is a Series, and you are attempting two-dimensional indexing, which triggers the IndexingError about too many indexers. And rec["A", "B"] uses a tuple key that does not exist because the Series index is not a MultiIndex.

The NaN from assigning type(rec) comes from alignment. With axis=0, your lambda returns a Series whose index is the set of column names (A, B here). When you assign that Series to a single new column of the original DataFrame, pandas attempts to align by index labels. The returned Series has index ["A", "B"]; the DataFrame rows are ["Row1", "Row2"]. The labels do not match, so the new column is filled with NaN.

To work row-wise you must flip the axis. With axis=1, apply() feeds each row into the lambda as a Series whose index is the DataFrame's column names. In that context, selecting multiple fields must use a list of labels, not a tuple. That is why rec[["A", "B"]] is valid, and rec["A", "B"] is not. The outer square brackets perform Series indexing; the inner square brackets define the list of labels to fetch together. If you already have the list in a variable, you simply pass that variable, for example rec[my_cols].

Fixing the code

Once axis is correct, both a generic row-wise max and a direct DataFrame reduction work cleanly.

import pandas as pds
import datetime as dtime
tbl = pds.DataFrame(
   [ [
      dtime.date(2025, 6, 5), dtime.date(2025, 6, 6) ],[
      dtime.date(2025, 6, 7), dtime.date(2025, 6, 8) ]
   ],
   columns=["A", "B"], index=["Row1", "Row2"]
)
# Correct way to use apply row-wise
tbl["MaxDate"] = tbl.apply(lambda rec: max(rec), axis=1)
# Or more simply, reduce selected columns directly
tbl["MaxDate"] = tbl[["A", "B"]].max(axis=1)

Why this detail matters

Understanding apply() prevents subtle bugs that masquerade as opaque errors or silent NaN columns. The lambda input is a Series whose index depends on the axis you choose. The Series you return is aligned by index labels to the target you assign into. If those labels do not match, pandas fills with NaN. This behavior shows up well beyond a max-over-dates example: any row-wise logic that mixes fields, computes types, or aggregates partial selections is sensitive to axis selection and to how you index the Series inside the lambda.

Takeaways

Be explicit about axis in apply() when you need row-wise operations. When working inside the lambda, remember that the Series index equals the column names for each row. To select multiple fields, use double brackets to pass a list of labels. And when you need a straightforward reduction across columns, prefer the vectorized form on the DataFrame, such as tbl[["A", "B"]].max(axis=1), which is concise and maps directly to the operation you want.