https://pytroubles.com/en/posts/id1128-pandas-left-join-merge-two-csvs-keep-unmatched-rows-and-fill-missing-values-with-n-a

Pandas left join: merge two CSVs, keep unmatched rows, and fill missing values with N/A

Merge two CSVs in pandas without losing rows: use a left join and fill non-matches with N/A

Pandas left join: merge two CSVs, keep unmatched rows, and fill missing values with N/A

Learn how to use pandas merge with a left join to keep all rows, avoid dropped records, and fill missing product details with N/A using fillna. See examples.

2025-10-20T08:00:06+03:00

Merge two CSVs in pandas without losing unmatched rows: a clean left join with N/A fillWhen you enrich one dataset with attributes from another, a small default in pandas can quietly drop rows you expected to keep. The goal here is straightforward: join shipment records with product metadata by product_code, retain all shipments even when the product is missing, and fill the absent details with N/A.Problem setupTwo simple tabular inputs illustrate the case. One holds shipments, the other contains product descriptions and categories. The expected output preserves every shipment and adds description/category when a match exists.shipment_id,product_code,quantity,date S001,P123,10,2025-07-01 S002,P456,5,2025-07-02 S003,P789,8,2025-07-03 product_code,description,category P123,Widget A,Tools P456,Widget B,Hardware Expected result:shipment_id,product_code,quantity,date,description,category S001,P123,10,2025-07-01,Widget A,Tools S002,P456,5,2025-07-02,Widget B,Hardware S003,P789,8,2025-07-03,N/A,N/A Minimal example showing the pitfallUsing the default merge drops the unmatched shipment P789, which is exactly what you don’t want in this scenario.import pandas as pd ship_data = [ ["S001", "P123", 10, "2025-07-01"], ["S002", "P456", 5, "2025-07-02"], ["S003", "P789", 8, "2025-07-03"] ] cols_ship = ["shipment_id", "product_code", "quantity", "date"] frame_ship = pd.DataFrame(ship_data, columns=cols_ship) item_data = [ ["P123", "Widget A", "Tools"], ["P456", "Widget B", "Hardware"] ] cols_item = ["product_code", "description", "category"] frame_item = pd.DataFrame(item_data, columns=cols_item) # Default merge: rows without a match are dropped inner_joined = pd.merge(frame_ship, frame_item) print(inner_joined) What’s actually happeningThe default behavior of pandas.merge drops rows that don’t find a counterpart in the other DataFrame. In other words, unmatched rows are excluded unless you explicitly request otherwise. When you switch to a left join, pandas keeps every row from the left DataFrame; for those without a match on the right, it places NaN in the new columns. From there, you can replace those NaN values with N/A to match the expected output format.There’s also a nuance worth keeping in mind. If both DataFrames contain multiple columns sharing the same names, pandas tries to match on all of them. To stick to a specific key, you can provide it explicitly via left_on and right_on.Solution: left join + fillnaThe fix is concise: perform a left merge on product_code, then fill missing values with N/A.import pandas as pd ship_data = [ ["S001", "P123", 10, "2025-07-01"], ["S002", "P456", 5, "2025-07-02"], ["S003", "P789", 8, "2025-07-03"] ] cols_ship = ["shipment_id", "product_code", "quantity", "date"] frame_ship = pd.DataFrame(ship_data, columns=cols_ship) item_data = [ ["P123", "Widget A", "Tools"], ["P456", "Widget B", "Hardware"] ] cols_item = ["product_code", "description", "category"] frame_item = pd.DataFrame(item_data, columns=cols_item) # Correct: keep all shipments and fill non-matches result_all_cols = ( pd.merge(frame_ship, frame_item, how="left") .fillna("N/A") ) print(result_all_cols) If you want only the category from the product table, merge with a column subset. This could be better as it does not alter the original DataFrame of frame_item.result_category_only = ( pd.merge(frame_ship, frame_item[["product_code", "category"]], how="left") .fillna("N/A") ) print(result_category_only) If your DataFrames share multiple same-named columns and you need to merge on a specific one, specify it explicitly.result_explicit_key = ( pd.merge( frame_ship, frame_item, how="left", left_on="product_code", right_on="product_code" ).fillna("N/A") ) Why this mattersData enrichment should be additive. Accidentally dropping shipments because a product is missing defeats the purpose and can skew downstream reporting. Using a left join preserves the entire shipment universe and makes absence explicit via N/A, which is more transparent and easier to audit.Wrap-upWhen augmenting records from one table with attributes from another, choose a left merge to keep every source row, and immediately standardize missing values with fillna("N/A"). If both sides have multiple same-named columns, pin the join key with left_on and right_on. And if you only need a subset of columns from the right table, merge on that subset to keep the output tidy.

pandas merge, left join, merge two CSVs, keep unmatched rows, fillna, N/A, fill missing values, product_code join, DataFrame merge, Python pandas, data enrichment

2025

2025, Oct 20 08:00

Merge two CSVs in pandas without losing rows: use a left join and fill non-matches with N/A

Learn how to use pandas merge with a left join to keep all rows, avoid dropped records, and fill missing product details with N/A using fillna. See examples.

Merge two CSVs in pandas without losing unmatched rows: a clean left join with N/A fill

When you enrich one dataset with attributes from another, a small default in pandas can quietly drop rows you expected to keep. The goal here is straightforward: join shipment records with product metadata by product_code, retain all shipments even when the product is missing, and fill the absent details with N/A.

Problem setup

Two simple tabular inputs illustrate the case. One holds shipments, the other contains product descriptions and categories. The expected output preserves every shipment and adds description/category when a match exists.

shipment_id,product_code,quantity,date
S001,P123,10,2025-07-01
S002,P456,5,2025-07-02
S003,P789,8,2025-07-03

product_code,description,category
P123,Widget A,Tools
P456,Widget B,Hardware

Expected result:

shipment_id,product_code,quantity,date,description,category
S001,P123,10,2025-07-01,Widget A,Tools
S002,P456,5,2025-07-02,Widget B,Hardware
S003,P789,8,2025-07-03,N/A,N/A

Minimal example showing the pitfall

Using the default merge drops the unmatched shipment P789, which is exactly what you don’t want in this scenario.

import pandas as pd
ship_data = [
    ["S001", "P123", 10, "2025-07-01"],
    ["S002", "P456", 5, "2025-07-02"],
    ["S003", "P789", 8, "2025-07-03"]
]
cols_ship = ["shipment_id", "product_code", "quantity", "date"]
frame_ship = pd.DataFrame(ship_data, columns=cols_ship)
item_data = [
    ["P123", "Widget A", "Tools"],
    ["P456", "Widget B", "Hardware"]
]
cols_item = ["product_code", "description", "category"]
frame_item = pd.DataFrame(item_data, columns=cols_item)
# Default merge: rows without a match are dropped
inner_joined = pd.merge(frame_ship, frame_item)
print(inner_joined)

What’s actually happening

The default behavior of pandas.merge drops rows that don’t find a counterpart in the other DataFrame. In other words, unmatched rows are excluded unless you explicitly request otherwise. When you switch to a left join, pandas keeps every row from the left DataFrame; for those without a match on the right, it places NaN in the new columns. From there, you can replace those NaN values with N/A to match the expected output format.

There’s also a nuance worth keeping in mind. If both DataFrames contain multiple columns sharing the same names, pandas tries to match on all of them. To stick to a specific key, you can provide it explicitly via left_on and right_on.

Solution: left join + fillna

The fix is concise: perform a left merge on product_code, then fill missing values with N/A.

import pandas as pd
ship_data = [
    ["S001", "P123", 10, "2025-07-01"],
    ["S002", "P456", 5, "2025-07-02"],
    ["S003", "P789", 8, "2025-07-03"]
]
cols_ship = ["shipment_id", "product_code", "quantity", "date"]
frame_ship = pd.DataFrame(ship_data, columns=cols_ship)
item_data = [
    ["P123", "Widget A", "Tools"],
    ["P456", "Widget B", "Hardware"]
]
cols_item = ["product_code", "description", "category"]
frame_item = pd.DataFrame(item_data, columns=cols_item)
# Correct: keep all shipments and fill non-matches
result_all_cols = (
    pd.merge(frame_ship, frame_item, how="left")
      .fillna("N/A")
)
print(result_all_cols)

If you want only the category from the product table, merge with a column subset. This could be better as it does not alter the original DataFrame of frame_item.

result_category_only = (
    pd.merge(frame_ship, frame_item[["product_code", "category"]], how="left")
      .fillna("N/A")
)
print(result_category_only)

If your DataFrames share multiple same-named columns and you need to merge on a specific one, specify it explicitly.

result_explicit_key = (
    pd.merge(
        frame_ship,
        frame_item,
        how="left",
        left_on="product_code",
        right_on="product_code"
    ).fillna("N/A")
)

Why this matters

Data enrichment should be additive. Accidentally dropping shipments because a product is missing defeats the purpose and can skew downstream reporting. Using a left join preserves the entire shipment universe and makes absence explicit via N/A, which is more transparent and easier to audit.

Wrap-up

When augmenting records from one table with attributes from another, choose a left merge to keep every source row, and immediately standardize missing values with fillna("N/A"). If both sides have multiple same-named columns, pin the join key with left_on and right_on. And if you only need a subset of columns from the right table, merge on that subset to keep the output tidy.

The article is based on a question from StackOverflow by user21677098 and an answer by Ranger.

awk csv data-cleaning pandas python