https://pytroubles.com/en/posts/id1765-pandas-map-multiple-columns-to-one-normalized-label-using-column-specific-lookups-and-fillna

Pandas: Map Multiple Columns to One Normalized Label Using Column-Specific Lookups and fillna

Map multiple pandas columns to one normalized label with column-specific lookups, fillna, backfill, and ChainMap

Pandas: Map Multiple Columns to One Normalized Label Using Column-Specific Lookups and fillna

Learn how to map multiple pandas columns to a single normalized label using column-specific lookups, Series.map with fillna or backfill, and ChainMap.

2025-11-09T01:00:09+03:00

Mapping heterogeneous columns to a single normalized label is a common wrangling task in pandas. When only one field per row is expected to match a given lookup table, the goal is to translate values from multiple columns and consolidate the result into one new column without branching logic or verbose code.Problem setupAssume a dataset with two candidate columns for classification and a numeric field. Only one of the two columns will have a value that appears in a predefined mapping for a given row. The objective is to translate whichever value matches and place the code in a unified column.import pandas as pd frame = pd.DataFrame({ 'Event1': ['Music', 'Something else 1', 'Theatre', 'Comedy'], 'Event2': ['Something else 2', 'Ballet', 'Something else 3', 'Something else 4'], 'Cost': [10000, 5000, 15000, 2000] }) code_map_a = {'Music': 'M', 'Cooking': 'C', 'Theatre': 'T', 'Comedy': 'C'} code_map_b = {'Ballet': 'B', 'Swimming': 'S'} What’s going on and why it’s trickyThe classification rules are column-specific: values in Event1 should be decoded with one mapping table, values in Event2 with another. Because only one column per row is expected to match, we want a clean way to try the appropriate mapping per column and then keep the first successful translation. Doing this with if/else over rows quickly becomes noisy, and it’s easy to miss edge cases.Concise solution for a fixed set of columnsAn idiomatic approach is to translate each column with its own dictionary and then use fillna to pick the first non-null match. This leverages vectorized operations and keeps the transformation declarative.frame['Event'] = frame['Event1'].map(code_map_a).fillna( frame['Event2'].map(code_map_b) ) The new Event column will contain M, B, T, or C for the corresponding rows, derived from the first column that produced a valid mapping.Scaling to more columns with per-column mappingsIf the number of event-like columns grows, pairing each column with its own dictionary and picking the first successful translation remains straightforward. The idea is to build a small registry of column-to-dictionary associations, apply the right lookup per column, then backfill across columns and take the first non-null value.code_map_c = {'Comedy': 'C'} column_maps = {'Event1': code_map_a, 'Event2': code_map_b, 'Event3': code_map_c} frame['Event'] = ( frame[list(column_maps)] .apply(lambda col: col.map(column_maps[col.name])) .bfill(axis=1) .iloc[:, 0] ) This pattern assumes mappings are scoped per column. It tries the correct dictionary for each column and consolidates results left to right.Global mapping across all event-like columnsWhen the lookup should be shared across columns, merging several dictionaries into one and applying a single translation simplifies the pipeline. After mapping, choose the first non-null value per row. There are two equivalent ways to express this.from collections import ChainMap frame['Event'] = ( frame.filter(like='Event').stack() .map(dict(ChainMap(code_map_a, code_map_b, code_map_c))) .groupby(level=0).first() ) Or, using a vectorized pass with backfilling:from collections import ChainMap frame['Event'] = ( frame.filter(like='Event') .map(dict(ChainMap(code_map_a, code_map_b, code_map_c)).get) .bfill(axis=1) .iloc[:, 0] ) Why this mattersThese patterns keep column-specific logic declarative and prevent row-wise loops or ad hoc conditionals. They scale from two columns to many without changing the mental model, remain readable for code reviews, and are easy to extend by adding new column–dictionary pairs or composing a global mapping when business rules converge.TakeawaysUse Series.map with the appropriate dictionary per column and fillna to capture the first match when only one column per row is expected to decode. For larger schemas, assemble a column-to-dictionary registry and backfill across the mapped columns. If the rule should be identical across all event-like fields, merge the dictionaries and select the first hit per row. This keeps the transformation succinct, maintainable, and consistent with idiomatic pandas.

pandas, map multiple columns, Series.map, fillna, backfill, ChainMap, column-specific mapping, lookup tables, normalized label, consolidate columns, vectorized solution, Python, DataFrame wrangling

2025

2025, Nov 09 01:00

Map multiple pandas columns to one normalized label with column-specific lookups, fillna, backfill, and ChainMap

Learn how to map multiple pandas columns to a single normalized label using column-specific lookups, Series.map with fillna or backfill, and ChainMap.

Problem setup

Assume a dataset with two candidate columns for classification and a numeric field. Only one of the two columns will have a value that appears in a predefined mapping for a given row. The objective is to translate whichever value matches and place the code in a unified column.

import pandas as pd
frame = pd.DataFrame({
    'Event1': ['Music', 'Something else 1', 'Theatre', 'Comedy'],
    'Event2': ['Something else 2', 'Ballet', 'Something else 3', 'Something else 4'],
    'Cost': [10000, 5000, 15000, 2000]
})
code_map_a = {'Music': 'M', 'Cooking': 'C', 'Theatre': 'T', 'Comedy': 'C'}
code_map_b = {'Ballet': 'B', 'Swimming': 'S'}

What’s going on and why it’s tricky

The classification rules are column-specific: values in Event1 should be decoded with one mapping table, values in Event2 with another. Because only one column per row is expected to match, we want a clean way to try the appropriate mapping per column and then keep the first successful translation. Doing this with if/else over rows quickly becomes noisy, and it’s easy to miss edge cases.

Concise solution for a fixed set of columns

An idiomatic approach is to translate each column with its own dictionary and then use fillna to pick the first non-null match. This leverages vectorized operations and keeps the transformation declarative.

frame['Event'] = frame['Event1'].map(code_map_a).fillna(
    frame['Event2'].map(code_map_b)
)

The new Event column will contain M, B, T, or C for the corresponding rows, derived from the first column that produced a valid mapping.

Scaling to more columns with per-column mappings

If the number of event-like columns grows, pairing each column with its own dictionary and picking the first successful translation remains straightforward. The idea is to build a small registry of column-to-dictionary associations, apply the right lookup per column, then backfill across columns and take the first non-null value.

code_map_c = {'Comedy': 'C'}
column_maps = {'Event1': code_map_a, 'Event2': code_map_b, 'Event3': code_map_c}
frame['Event'] = (
    frame[list(column_maps)]
      .apply(lambda col: col.map(column_maps[col.name]))
      .bfill(axis=1)
      .iloc[:, 0]
)

This pattern assumes mappings are scoped per column. It tries the correct dictionary for each column and consolidates results left to right.

Global mapping across all event-like columns

When the lookup should be shared across columns, merging several dictionaries into one and applying a single translation simplifies the pipeline. After mapping, choose the first non-null value per row. There are two equivalent ways to express this.

from collections import ChainMap
frame['Event'] = (
    frame.filter(like='Event').stack()
         .map(dict(ChainMap(code_map_a, code_map_b, code_map_c)))
         .groupby(level=0).first()
)

Or, using a vectorized pass with backfilling:

from collections import ChainMap
frame['Event'] = (
    frame.filter(like='Event')
         .map(dict(ChainMap(code_map_a, code_map_b, code_map_c)).get)
         .bfill(axis=1)
         .iloc[:, 0]
)

Why this matters

These patterns keep column-specific logic declarative and prevent row-wise loops or ad hoc conditionals. They scale from two columns to many without changing the mental model, remain readable for code reviews, and are easy to extend by adding new column–dictionary pairs or composing a global mapping when business rules converge.

Takeaways

Use Series.map with the appropriate dictionary per column and fillna to capture the first match when only one column per row is expected to decode. For larger schemas, assemble a column-to-dictionary registry and backfill across the mapped columns. If the rule should be identical across all event-like fields, merge the dictionaries and select the first hit per row. This keeps the transformation succinct, maintainable, and consistent with idiomatic pandas.

The article is based on a question from StackOverflow by koteletje and an answer by mozway.

pandas python