https://pytroubles.com/en/posts/id1221-append-rows-to-a-pandas-dataframe-with-column-alignment-use-pd-concat-not-dataframe-update

Append Rows to a Pandas DataFrame with Column Alignment: Use pd.concat, Not DataFrame.update

How to Append Rows from One Pandas DataFrame to Another with Column Alignment (pd.concat vs update)

Append Rows to a Pandas DataFrame with Column Alignment: Use pd.concat, Not DataFrame.update

Append rows from a second Pandas DataFrame with proper column alignment using pd.concat, not DataFrame.update. Keep all rows and avoid overwrites safely.

2025-10-21T23:00:04+03:00

When two pandas DataFrames share some columns but not all, it’s easy to reach for an in-place update. But if your goal is to keep all rows from the first DataFrame and then add new rows from the second, aligned by column names, an update won’t help. The right tool here is a row-wise concatenation with column alignment.Problem setupOne table is wider (more columns), the other is narrower (a subset of those columns, possibly in a different order). The desired result is to keep all original rows and append the rows from the narrower table, placing values under matching column names and leaving the rest empty.A misleading approachThe following snippet tries to “update” one frame from another based on a key. It sets an index and uses DataFrame.update, which modifies values in existing rows only. That’s not what we need if the second DataFrame contributes new rows instead of edits.def sync_frames(big_tbl, small_tbl, id_col): """ Updates big_tbl with values from small_tbl based on a common key column. Only columns present in small_tbl will be updated in big_tbl. Parameters: - big_tbl (pd.DataFrame): The larger DataFrame. - small_tbl (pd.DataFrame): The smaller DataFrame with updated values. - id_col (str): The column name used as the key for matching rows. Returns: - pd.DataFrame: Updated big_tbl. """ slim = small_tbl.drop_duplicates(subset=id_col) big_tbl.set_index(id_col, inplace=True) slim.set_index(id_col, inplace=True) big_tbl.update(slim) big_tbl.reset_index(inplace=True) return big_tblWhy this fails for the described needThe expected output shows that rows from the second DataFrame should appear as additional rows in the result, with values aligned under the same column names and blanks elsewhere. DataFrame.update does not append rows; it only overwrites existing cells where indices overlap. In other words, you’re not updating in place—you’re combining datasets by rows.The correct approach: row-wise concatenation with column alignmentTo append rows from the second DataFrame while preserving column alignment, use pd.concat. This stacks rows and aligns columns by name. Any column missing in a particular input gets NaN (which you can optionally replace with an empty string for display).import pandas as pd # Combine rows with column alignment blended = pd.concat([wide_df, narrow_df], ignore_index=True, sort=False) # Optional: present missing values as blanks blended = blended.fillna("") # Optional: pretty print as a markdown table print(blended.to_markdown(tablefmt="grid"))This produces a result where all rows from the first DataFrame are preserved, and every row from the second DataFrame is appended. Shared columns line up correctly; non-shared columns are empty where appropriate.Why this detail mattersChoosing between update and concat determines whether you mutate existing rows or grow the dataset. For pipelines that integrate heterogeneous sources or partial extracts, row-wise concatenation maintains data integrity and avoids unintended overwrites. It also scales naturally when column order differs—the alignment is handled by column labels, not by position.TakeawaysIf you need to add new records from a DataFrame that only has a subset of columns, concatenate with pd.concat and enable column alignment via sort=False. Use DataFrame.update only when you truly intend to overwrite values in existing rows keyed by matching indices. If the result is intended for human inspection, fill missing values with an empty string to keep the layout clear.

pandas, DataFrame, append rows, pd.concat, DataFrame.update, column alignment, row-wise concatenation, concatenate DataFrames, keep all rows, align by column names, wide vs narrow tables, fillna

2025

2025, Oct 21 23:00

How to Append Rows from One Pandas DataFrame to Another with Column Alignment (pd.concat vs update)

Append rows from a second Pandas DataFrame with proper column alignment using pd.concat, not DataFrame.update. Keep all rows and avoid overwrites safely.

Problem setup

One table is wider (more columns), the other is narrower (a subset of those columns, possibly in a different order). The desired result is to keep all original rows and append the rows from the narrower table, placing values under matching column names and leaving the rest empty.

A misleading approach

The following snippet tries to “update” one frame from another based on a key. It sets an index and uses DataFrame.update, which modifies values in existing rows only. That’s not what we need if the second DataFrame contributes new rows instead of edits.

def sync_frames(big_tbl, small_tbl, id_col):
    """
    Updates big_tbl with values from small_tbl based on a common key column.
    Only columns present in small_tbl will be updated in big_tbl.
    Parameters:
    - big_tbl (pd.DataFrame): The larger DataFrame.
    - small_tbl (pd.DataFrame): The smaller DataFrame with updated values.
    - id_col (str): The column name used as the key for matching rows.
    Returns:
    - pd.DataFrame: Updated big_tbl.
    """
    slim = small_tbl.drop_duplicates(subset=id_col)
    big_tbl.set_index(id_col, inplace=True)
    slim.set_index(id_col, inplace=True)
    big_tbl.update(slim)
    big_tbl.reset_index(inplace=True)
    return big_tbl

Why this fails for the described need

The expected output shows that rows from the second DataFrame should appear as additional rows in the result, with values aligned under the same column names and blanks elsewhere. DataFrame.update does not append rows; it only overwrites existing cells where indices overlap. In other words, you’re not updating in place—you’re combining datasets by rows.

The correct approach: row-wise concatenation with column alignment

To append rows from the second DataFrame while preserving column alignment, use pd.concat. This stacks rows and aligns columns by name. Any column missing in a particular input gets NaN (which you can optionally replace with an empty string for display).

import pandas as pd
# Combine rows with column alignment
blended = pd.concat([wide_df, narrow_df], ignore_index=True, sort=False)
# Optional: present missing values as blanks
blended = blended.fillna("")
# Optional: pretty print as a markdown table
print(blended.to_markdown(tablefmt="grid"))

This produces a result where all rows from the first DataFrame are preserved, and every row from the second DataFrame is appended. Shared columns line up correctly; non-shared columns are empty where appropriate.

Why this detail matters

Choosing between update and concat determines whether you mutate existing rows or grow the dataset. For pipelines that integrate heterogeneous sources or partial extracts, row-wise concatenation maintains data integrity and avoids unintended overwrites. It also scales naturally when column order differs—the alignment is handled by column labels, not by position.

Takeaways

If you need to add new records from a DataFrame that only has a subset of columns, concatenate with pd.concat and enable column alignment via sort=False. Use DataFrame.update only when you truly intend to overwrite values in existing rows keyed by matching indices. If the result is intended for human inspection, fill missing values with an empty string to keep the layout clear.

The article is based on a question from StackOverflow by Anupkumar Kasi and an answer by Mario.

dataframe pandas python