2026, Jan 04 13:00

Vectorized Pandas Backtesting: Attach 15-Minute OHLC to Minute Data and Evaluate Stop Loss Without Loops

Speed up pandas backtesting by vectorizing stop-loss checks: attach resampled 15-minute OHLC to minute data, forward-fill, and eliminate slow per-row loops.

Backtesting strategies that mix coarse decision signals with fine-grained execution checks often run into a performance wall. A common pattern is to compute entries from 15-minute candles and then repeatedly slice 1-minute ticks to detect a stop loss. It works, but on millions of rows the per-row lookups into the original dataset quickly become the bottleneck.

Problem setup

Suppose we build 15-minute OHLC bars from minute-level market data, compute entries from those bars, and for each bar go back to the 1-minute DataFrame to verify whether a fixed stop loss was hit within that window. The following minimal snippet illustrates the approach that causes the slowdown when the dataset grows.

import pandas as pd

bars_15 = base_df.resample('15min').agg({
  'Open': 'first',
  'High': 'max',
  'Low': 'min',
  'Close': 'last'
})

outcome = []

for bar_time, bar in bars_15.iterrows():
    entry_px = bar['Open']
    sl_px = entry_px - 10  # fixed stop loss for illustration

    window = base_df[(base_df.index >= bar_time) & (base_df.index < bar_time + pd.Timedelta('15min'))]

    touched = window[window['Low'] <= sl_px]

    if not touched.empty:
        outcome.append({"time": bar_time, "stop_loss_hit": True})
    else:
        outcome.append({"time": bar_time, "stop_loss_hit": False})

report_df = pd.DataFrame(outcome)
print(report_df)

Why this slows down

Iterating over DataFrame rows discards most of the performance benefits of pandas. Each loop iteration triggers boolean indexing and a fresh slice of the full minute-level table. That repeated slicing is expensive and does not scale when you have to do it for every 15-minute bar across large timelines. Even if the logic is correct, the access pattern is not cache-friendly and prevents vectorized execution.

The efficient pattern

The efficient pattern is to avoid row-wise loops and recast the logic so the stop loss evaluation is computed in a vectorized way. Instead of building a separate 15-minute table and jumping back into the 1-minute table for each row, it is faster to attach the resampled OHLC data back onto the original minute-level DataFrame and forward-fill these columns within each 15-minute window. That way every minute row carries its corresponding 15-minute Open, High, Low, Close, and you can evaluate the stop loss for all rows at once. With this layout, a single vectorized comparison produces the stop loss flag without any per-interval slicing.

Vectorized solution

The following code keeps all computation in one pass. It adds the 15-minute OHLC columns to the minute-level DataFrame via resample, forward-fills those values, and then evaluates the stop loss condition with a vectorized expression.

import pandas as pd
import random
import numpy as np
import datetime

# Demo setup: synthetic minute index and price-like values
t_start = datetime.datetime.now()
print(t_start)

time_idx = pd.date_range('2000-01-01', periods=99000, freq='min')
minute_vals = random.sample(list(np.random.randint(0, 100, 99000)), 99000)

ticks = pd.DataFrame(minute_vals, index=time_idx, columns=['Price'])

# Add 15-minute OHLC as new columns onto the minute-level DataFrame
# These columns are aligned to the 15-minute frequency and will be forward-filled
# so that every minute within a 15-minute bucket carries the same OHLC context.
ticks[['Open', 'High', 'Low', 'Close']] = ticks.resample('15min').agg([
    'first',
    'max',
    'min',
    'last'
])

ticks = ticks.fillna(method='ffill')

# Vectorized stop-loss evaluation relative to the 15-minute Open
ticks['sl_flag'] = np.where(ticks['Price'] <= (ticks['Open'] - 10), True, False)

t_end = datetime.datetime.now()
print(t_end - t_start)
print(ticks.to_string())

This approach avoids per-row lookups into the base table entirely. The 15-minute context is attached once, forward-propagated across the minute rows, and then the stop loss flag is computed in a single vectorized step.

Why this matters

For large datasets, choosing vectorized operations in pandas is the difference between minutes and hours. Iterating over rows forces Python-level loops and repeated filtering, while vectorized operations leverage pandas’ internal optimizations. Aligning data once and performing comparisons in bulk keeps the memory access pattern simple and avoids redundant work. When you need to combine signals at different frequencies, attaching the coarser features to the finer-grained index and using forward fill is a pragmatic pattern that scales.

If alignment between datasets is part of the workflow, consider building it into the data model upfront rather than resolving it repeatedly during the backtest loop. There are also alignment-oriented tools such as merge_asof or window operations with pandas.DataFrame.rolling that can help express these relationships without explicit iteration.

Takeaways

Keep the computation inside pandas and let it work on columns, not rows. Avoid slicing back into the full dataset for each interval; instead, propagate the resampled context down to the base frequency and evaluate your stop loss checks in a single vectorized expression. This pattern preserves correctness while remaining practical on datasets with millions of rows.

dataframe pandas python