2025, Dec 19 07:00

Vectorized ways to copy DataFrame columns by mapping in pandas: replace nested loops with assign or reindex/rename

Learn how to replace nested loops with vectorized pandas operations to copy DataFrame columns by mapping. Use assign or reindex+rename for chainable code.

When you need to normalize a pandas DataFrame by copying values from “source” columns into multiple “target” columns based on a mapping, a quick nested loop does the job. But if you prefer a more idiomatic, vectorized approach that plays nicely with method chaining, pandas has you covered.

Sample data and the baseline implementation

Consider a DataFrame where certain columns should mirror the values of other columns according to a simple mapping. The goal is to overwrite A2 and A3 with values from A1, and A5 with values from A4.

import pandas as pd
frame = pd.DataFrame({
  "A1": [1, 11, 111],
  "A2": [2, 22, 222],
  "A3": [3, 33, 333],
  "A4": [4, 44, 444],
  "A5": [5, 55, 555]
})
rel_map = {
  "A1": ["A2", "A3"],
  "A4": ["A5"]
}
for src_col, targets in rel_map.items():
    for dst_col in targets:
        frame[dst_col] = frame[src_col]
print(frame)

This produces the intended result:

    A1   A2   A3   A4   A5
0    1    1    1    4    4
1   11   11   11   44   44
2  111  111  111  444  444

What’s the core issue?

The nested loop is explicit and clear, but it pushes the work through Python-level iteration. In pandas, operations that describe the full assignment at once tend to be more concise, chainable, and often better aligned with DataFrame semantics. The task here is to replace the loop with a vectorized, “pandastic” expression without changing the result.

Vectorized solutions

The first approach uses DataFrame.assign to build all target column updates in a single call. The second leverages column projection with reindex paired with rename and set_axis to realign columns using an inverted mapping.

Using assign with a rewritten mapping:

updated_one = frame.assign(**{dest: frame.get(src)
                              for src, dest_list in rel_map.items()
                              for dest in dest_list})
print(updated_one)

Note that assign is not in place. Use it in a chain or reassign the result to the same variable if you want to persist the changes.

Using reindex and rename/set_axis by inverting the mapping:

alias_map = {alias: origin
             for origin, aliases in rel_map.items()
             for alias in aliases}
updated_two = (
    frame.reindex(columns=frame.rename(columns=alias_map).columns)
         .set_axis(frame.columns, axis=1)
)
print(updated_two)

Both approaches produce the same output as the loop:

    A1   A2   A3   A4   A5
0    1    1    1    4    4
1   11   11   11   44   44
2  111  111  111  444  444

Why it matters

Describing the entire transformation in a single expression makes the intent obvious and often integrates better with pandas workflows. It can also reduce the amount of Python-level iteration you write. As for speed, there is no one-size-fits-all answer. The relative performance can depend on the total number of columns, how many columns you modify, the dtypes of the underlying data, and even whether the DataFrame is fragmented. The most reliable way to decide is to test on your real data.

Takeaways

If you want a clear, chainable solution, DataFrame.assign is a solid choice. If you prefer working through column projection and need precise control over column alignment, the reindex plus rename/set_axis approach is also effective. In both cases, the DataFrame ends up with target columns overwritten by their mapped source columns, just as in the loop. When performance matters, benchmark these options on the actual workload and pick the one that behaves best in your environment.