2025, Oct 28 23:00

Fixing Pandas All-True Comparisons: Compare DataFrame Series, Not df.columns Labels

Learn why comparing df.columns returns the same boolean in pandas and how to correctly compare DataFrame columns row‑wise using Series, iloc and explicit names.

Comparing two pandas columns to build a boolean flag sounds trivial, yet a tiny indexing mistake can turn every row into the same value. Here’s a concise breakdown of why comparing lag2Open to MGC=F unexpectedly returns True for all rows and how to do it correctly.

Problem

You want to compare two DataFrame columns, lag2Open and MGC=F, and store whether the first is greater than or equal to the second in a new column Higher than 0. The attempt below always yields True, even though a separate validation column (diff) suggests that’s incorrect.

df_prices["Higher than 0"] = [df_prices.columns[1]] >= [df_prices.columns[0]]

What’s going on

df_prices.columns[1] and df_prices.columns[0] are column names, not the column data. In other words, you’re comparing labels instead of Series. That’s why the result does not reflect row-wise comparisons and ends up being the same across the entire column.

Solution

Reference the Series, not the labels. You can do that by selecting columns directly, by position, or by explicit names.

df_prices["Higher than 0"] = df_prices[df_prices.columns[1]] >= df_prices[df_prices.columns[0]]

Or by positional indexing:

df_prices["Higher than 0"] = df_prices.iloc[:, 1] >= df_prices.iloc[:, 0]

Or the most explicit variant using the actual column names:

df_prices["Higher than 0"] = df_prices['lag2Open'] >= df_prices['MGC=F']

Why this matters

When column selection returns a label instead of data, the comparison no longer operates row-wise. The result looks valid syntactically, but the logic is disconnected from the underlying Series. Recognizing the difference between column names and column data prevents silent, all-True or all-False columns and saves time on debugging downstream calculations.

Takeaway

Always ensure you’re comparing Series, not labels. Use df[col_name], df.iloc[:, idx], or explicit string-based selection. If the output looks suspiciously uniform, double-check what you’re actually comparing by printing the objects you pass into the expression.

The article is based on a question from StackOverflow by Rafael Alexandre Sousa and an answer by user19077881.