2025, Dec 17 11:00

Cycle a shorter pandas DataFrame to match a longer one using a modulo-based key and merge

Learn how to cycle a shorter pandas DataFrame to match a longer one using numpy.arange modulo keys and a merge. Clean, vectorized alignment without hacks.

Cycle a shorter pandas DataFrame to match a longer one

Sometimes you need to line up two pandas DataFrames of different lengths and have the shorter one repeat, cycling its rows until it matches the length of the longer one. With plain Python lists this is trivial using itertools.cycle, but doing the same with DataFrames calls for a different approach.

The quick mental model with lists

from itertools import cycle

a = range(7)
b = range(43)

paired = zip(cycle(a), b)

This restarts a when it reaches the end, pairing it against the full length of b. Trying to plug this idea directly into pandas.concat won’t work, because DataFrame concatenation expects real, aligned objects, not an infinite iterator of rows.

The naive attempt in pandas

import pandas as pd
from itertools import cycle

left_df = pd.DataFrame(...)  # length 7
right_df = pd.DataFrame(...)  # length 43

bad_combo = pd.concat([cycle(left_df), right_df], axis=1)

This fails because cycle(left_df) does not yield a concrete, indexable DataFrame. You don’t get a repeated block ready to align across columns; you get an endless generator of rows that concat can’t consume as a whole.

What’s actually going on

To emulate cycle semantics with DataFrames, you need a deterministic way to map each row of the longer DataFrame to a row of the shorter one, repeating from the start when you run out. The clean way to do this is to compute a key per row using numpy.arange and the modulo operator. The modulo creates a repeating pattern of integers that acts like a restart index, just like itertools.cycle. With that shared key, a merge lines the two tables up.

The solution: build a repeating join key and merge

import numpy as np
import pandas as pd

# Example inputs
left_df = pd.DataFrame({
    'col1': list('ABCDEFG'),
    'col2': ['X'] * 7
})

right_df = pd.DataFrame({
    'col3': list('abc'),
    'col4': ['Y'] * 3
})

# Cycle-like alignment via modulo-based keys
result_df = (
    left_df.merge(
        right_df,
        left_on=np.arange(len(left_df)) % len(right_df),
        right_on=np.arange(len(right_df)) % len(left_df)
    )
    .drop(columns=['key_0'])
)

This produces a DataFrame with the length of the longer input and a repeated, cycling match from the shorter one, restarting exactly the way itertools.cycle would. Here is the resulting output:

  col1 col2 col3 col4
0    A    X    a    Y
1    B    X    b    Y
2    C    X    c    Y
3    D    X    a    Y
4    E    X    b    Y
5    F    X    c    Y
6    G    X    a    Y

If you want to see the alignment key that drives this behavior, don’t drop it:

   key_0 col1 col2 col3 col4
0      0    A    X    a    Y
1      1    B    X    b    Y
2      2    C    X    c    Y
3      0    D    X    a    Y
4      1    E    X    b    Y
5      2    F    X    c    Y
6      0    G    X    a    Y

Why this matters

Data alignment is central to pandas. When inputs differ in length, trying to brute-force repetition at the row level leads to awkward or non-functional patterns. A modulo-based key gives you a precise, vectorized way to express “cycle this shorter table” and relies on merge, which is built for relational alignment. The result is predictable and scales cleanly.

Takeaways

When you need cycle-like repetition of one DataFrame against another, generate a repeating key with numpy.arange and the modulo operator, then merge on that key. It mirrors the behavior of itertools.cycle while staying idiomatic to pandas, avoids brittle workarounds, and keeps the logic explicit and maintainable.

cycle dataframe pandas python