2025, Oct 16 12:00

How to fix misaligned comparisons in pandas after yfinance: handle MultiIndex columns correctly

Learn to fix pandas yfinance MultiIndex issues causing ValueError in Series comparisons. See clear steps to align labels, flatten columns, and set indicators.

Fixing misaligned comparisons in pandas after yfinance: MultiIndex columns bite back

Pulling OHLCV data with yfinance and then adding your own technical columns is straightforward until you start comparing columns that don’t share the same labels. If you’re seeing ValueError messages when updating a DataFrame “in place”, the culprit is often hidden in the column index. Here’s a practical walkthrough of what goes wrong and how to fix it cleanly.

Reproducing the issue

The goal is simple: compute a 200-day simple moving average (SMA), take the last business day of each month, and set an Indicator based on whether the close is above or below the SMA.

import pandas as pd
import yfinance as yf
from datetime import date
from dateutil.relativedelta import relativedelta

sym = 'AAPL'  # Apple Inc.
today_dt = date.today()
until_dt = today_dt
span_years = 3
since_dt = today_dt - relativedelta(days=span_years * 365.25 + 200)

try:
    quotes = yf.download(sym, start=since_dt, end=until_dt, auto_adjust=True)
except Exception as err:
    print(f"Error downloading data: {err}")
    raise SystemExit

if quotes.empty:
    print("No data downloaded. Please check the ticker and dates.")
    raise SystemExit

quotes['200_SMA'] = quotes['Close'].rolling(window=200).mean()
month_end = quotes.resample('BME').last().dropna()

month_end['Indicator'] = 'Hold'
# This line raises a ValueError
month_end['Indicator'] = [
    'Above' if row['Close'] > row['200_SMA'] else 'Below'
    for _, row in month_end.iterrows()
]

Typical errors look like these:

ValueError: Can only compare identically-labeled Series objects
ValueError: Operands are not aligned. Do left, right = left.align(right, axis=1, copy=False) before operating.

What’s actually happening

yfinance returns a DataFrame whose columns are a MultiIndex. One level holds the price field name like Close, High, Low, Open, Volume. Another level holds the ticker. For a single-symbol download that means you’ll see tuples such as ("Close", "AAPL"). When you add a custom column like 200_SMA, it ends up with a different second-level label, namely ("200_SMA", "").

Now compare what your code does: it tries to evaluate row['Close'] > row['200_SMA']. Under the hood that becomes comparing the Series labeled ("Close", "AAPL") with the Series labeled ("200_SMA", ""). Because the labels are not identical, pandas refuses the operation and raises the alignment error. If you later load more tickers, you’ll also see ("Close", "AMZN") and ("Close", "GOOG"), which makes it even more obvious that the second level matters.

Two ways to fix it

You can either index the exact MultiIndex keys, or flatten the columns to a single level so the comparison is aligned.

Option 1: Address the full MultiIndex explicitly

This keeps the MultiIndex in place and selects by both levels when comparing.

import pandas as pd
import yfinance as yf
from datetime import date
from dateutil.relativedelta import relativedelta

sym = 'AAPL'
today_dt = date.today()
until_dt = today_dt
span_years = 3
since_dt = today_dt - relativedelta(days=span_years * 365.25 + 200)

quotes = yf.download(sym, start=since_dt, end=until_dt, auto_adjust=True)
if quotes.empty:
    raise SystemExit("No data downloaded. Please check the ticker and dates.")

quotes['200_SMA'] = quotes['Close'].rolling(window=200).mean()
month_end = quotes.resample('BME').last().dropna()

month_end['Indicator'] = [
    'Above' if row[('Close', 'AAPL')] > row[('200_SMA', '')] else 'Below'
    for _, row in month_end.iterrows()
]

Here, ('Close', 'AAPL') and ('200_SMA', '') match the actual column labels, so pandas can align the Series and the comparison works.

Option 2: Drop the ticker level (flatten to one level)

Remove the "Ticker" level so that both Close and 200_SMA live in a single-level column index.

import pandas as pd
import yfinance as yf
from datetime import date
from dateutil.relativedelta import relativedelta

sym = 'AAPL'
today_dt = date.today()
until_dt = today_dt
span_years = 3
since_dt = today_dt - relativedelta(days=span_years * 365.25 + 200)

quotes = yf.download(sym, start=since_dt, end=until_dt, auto_adjust=True)
if quotes.empty:
    raise SystemExit("No data downloaded. Please check the ticker and dates.")

# Flatten columns by dropping the second level
flat = quotes.droplevel('Ticker', axis=1).copy()

flat['200_SMA'] = flat['Close'].rolling(window=200).mean()
month_end = flat.resample('BME').last().dropna()

month_end['Indicator'] = [
    'Above' if row['Close'] > row['200_SMA'] else 'Below'
    for _, row in month_end.iterrows()
]

As an alternative flattening approach, you can also map the first element of each column tuple: flat.columns = [c[0] for c in flat.columns]. The key is to end up with identically labeled Series when you perform the comparison.

Why this matters

pandas aligns by labels, not by position, when comparing Series or broadcasting operations across DataFrames. With MultiIndex columns, it’s easy to accidentally compare objects that look compatible but differ in one of the index levels. The error messages are a hint that alignment failed, not that the values themselves are problematic. Being explicit about column labels or flattening at the right step avoids subtle bugs, especially as the code grows to handle multiple tickers.

Takeaways

If comparisons or assignments fail with “Can only compare identically-labeled Series objects” or “Operands are not aligned,” inspect the column index to confirm whether you’re dealing with a MultiIndex. With yfinance output, expect tuples like ("Close", "AAPL"). Either address both levels when selecting, or drop the unused level before computing derived columns. Once your labels line up, the in-place update works as intended.

The article is based on a question from StackOverflow by user23435723 and an answer by furas.