https://pytroubles.com/en/posts/id1295-fix-csv-to-excel-conversion-in-pandas-prevent-scientific-notation-as-text-handle-metadata

Fix CSV to Excel conversion in pandas: prevent scientific notation as text, handle metadata

Pandas CSV to Excel: avoid scientific notation stored as text, keep metadata, fix headers

Fix CSV to Excel conversion in pandas: prevent scientific notation as text, handle metadata

Convert CSV/TXT to Excel with pandas without numbers turning into text: set the right header, keep metadata, handle variable headers, avoid float_format issues

2025-10-23T21:00:08+03:00

2025-10-23T21:00:09+03:00

Converting CSV to Excel with pandas is usually a one-liner, until scientific notation and stray metadata get in the way. A common trap is that numeric-looking values quietly turn into strings in Excel, which then breaks downstream steps like charting. Here’s how that happens and how to fix it without losing useful metadata lines at the top of the file.Minimal reproduction of the issueThe following function reads .csv or .txt files, attempts to coerce columns to numbers, and writes .xlsx output. The result can look fine at a glance but still produce text instead of numbers for rows below a metadata header.import pandas as pdsimport osfrom tkinter import filedialog as fd, messagebox as mbdef export_to_xlsx(): picked = fd.askopenfilenames(filetypes=[("Text and CSV files", "*.txt *.csv")]) if not picked: return for src in picked: try: delim = '\t' if src.lower().endswith('.txt') else ',' frame = pds.read_csv(src, sep=delim) for field in frame.columns: frame[field] = pds.to_numeric(frame[field], errors='ignore') out_path = os.path.splitext(src)[0] + '.xlsx' with pds.ExcelWriter(out_path, engine='openpyxl') as book: frame.to_excel(book, index=False, float_format="%.3f") except Exception as err: mb.showerror("Error", f"Error converting file {src}: {err}") return mb.showinfo("Success", "All files have been converted successfully!")export_to_xlsx()Why this happenspandas will happily infer dtypes if the file starts with clean headers and rows. For instance, with a plain CSV like this, the values are numeric in Excel and ready to graph:Frequency,Random5.0e+3,6.01e-45.0e+3,6.01e-45.0e+3,6.01e-45.0e+3,6.01e-4But if the file begins with arbitrary metadata before the header, pandas reads those cells as part of the table and cannot confidently infer numeric types. Consider this structure:!Keysignth,Easdf,MYA1234,A.04.00!Date: ThurBEGIN CH1_DATAFrequency,Random5.0e+3,6.01e-45.0e+3,6.01e-45.0e+3,6.01e-45.0e+3,6.01e-4Here’s the kicker: using pds.to_numeric(..., errors='ignore') does not “convert what you can and skip the rest”. If any value in a column fails to parse, the entire conversion for that column is skipped, silently. Combined with those top lines, you end up with object-typed columns that get written as text. The numbers then look like scientific notation strings, not true numerics.There is also a practical note about representation: numbers in memory don’t have a “scientific” or “regular” format. That is a display concern when reading or writing. The only thing that matters for plotting is whether the cells are numbers or text.Fix 1: point pandas at the real headerIf the header always starts after a fixed number of lines, tell pandas where the actual header row lives. That way it ignores everything above it when building the DataFrame, and numeric inference works as expected.import pandas as pdsimport osfrom tkinter import filedialog as fd, messagebox as mbdef export_to_xlsx_fixed_header(): picked = fd.askopenfilenames(filetypes=[("Text and CSV files", "*.txt *.csv")]) if not picked: return for src in picked: try: delim = '\t' if src.lower().endswith('.txt') else ',' frame = pds.read_csv(src, sep=delim, header=3) # headers are on line index 3 out_path = os.path.splitext(src)[0] + '.xlsx' with pds.ExcelWriter(out_path, engine='openpyxl') as book: frame.to_excel(book, index=False) # no float_format except Exception as err: mb.showerror("Error", f"Error converting file {src}: {err}") return mb.showinfo("Success", "All files have been converted successfully!")This preserves numeric types but drops the metadata lines from the Excel file, since read_csv ignores everything before the header.Fix 2: keep metadata and convert only data rowsIf you want those top lines in the Excel output, load the file normally, then convert just the data region to numeric types. Skipping the metadata rows avoids the column-wide failure from earlier.import pandas as pdsimport osfrom tkinter import filedialog as fd, messagebox as mbdef export_to_xlsx_keep_meta(): picked = fd.askopenfilenames(filetypes=[("Text and CSV files", "*.txt *.csv")]) if not picked: return for src in picked: try: delim = '\t' if src.lower().endswith('.txt') else ',' frame = pds.read_csv(src, sep=delim) # do not skip metadata for field in frame.columns: frame.loc[3:, field] = pds.to_numeric(frame[field][3:]) # convert from row index 3 out_path = os.path.splitext(src)[0] + '.xlsx' with pds.ExcelWriter(out_path, engine='openpyxl') as book: frame.to_excel(book, index=False) # no float_format except Exception as err: mb.showerror("Error", f"Error converting file {src}: {err}") return mb.showinfo("Success", "All files have been converted successfully!")With this, the metadata stays, and the numeric block becomes true numbers suitable for plotting.Fix 3: when the metadata height isn’t constantIf the number of metadata lines varies, locate the first numeric row programmatically and then convert from there. One simple heuristic is to scan the first column until a numeric value is found.# after reading frame with pds.read_csv(...):first_col = frame.columns[0]for row_idx, cell in enumerate(frame[first_col]): try: pds.to_numeric(cell) start_idx = row_idx break except ValueError: passelse: print("Warning: no numeric data in the first column") start_idx = row_idx + 1Then apply numeric conversion from that row for every column.for field in frame.columns: frame.loc[start_idx:, field] = pds.to_numeric(frame[field][start_idx:])This approach assumes that all columns switch to numeric at the same row and that nothing non-numeric appears after that point.About float_formatPassing float_format="%.3f" to to_excel forces rounding before writing. That can hide distinctions like 6e-4 versus 6.12e-4 by turning them into the same displayed value when rounded. Once your data are truly numeric, let Excel handle number formats as needed.Why this mattersCharts, aggregations, and formulas in Excel rely on actual numeric types. If data are stored as strings, plots either fail or silently produce wrong outputs. Getting the read step right—by pointing to the real header or converting only the data region—removes the ambiguity and preserves numeric integrity.Wrap-upThe symptoms looked like a formatting problem, but the root cause was mixed content before the header combined with a permissive conversion flag. The reliable path is to either instruct pandas where the header starts or convert only the rows that comprise the dataset. Avoid pre-rounding during export. With those adjustments, scientific notation parses as proper numbers and behaves predictably in Excel and in charts.

pandas csv to excel, scientific notation, numbers as text, metadata header, variable header rows, read_csv header, to_excel, float_format, python excel conversion, data cleaning

2025

2025, Oct 23 21:00

Pandas CSV to Excel: avoid scientific notation stored as text, keep metadata, fix headers

Convert CSV/TXT to Excel with pandas without numbers turning into text: set the right header, keep metadata, handle variable headers, avoid float_format issues

Minimal reproduction of the issue

The following function reads .csv or .txt files, attempts to coerce columns to numbers, and writes .xlsx output. The result can look fine at a glance but still produce text instead of numbers for rows below a metadata header.

import pandas as pds
import os
from tkinter import filedialog as fd, messagebox as mb

def export_to_xlsx():
    picked = fd.askopenfilenames(filetypes=[("Text and CSV files", "*.txt *.csv")])
    if not picked:
        return

    for src in picked:
        try:
            delim = '\t' if src.lower().endswith('.txt') else ','
            frame = pds.read_csv(src, sep=delim)

            for field in frame.columns:
                frame[field] = pds.to_numeric(frame[field], errors='ignore')

            out_path = os.path.splitext(src)[0] + '.xlsx'
            with pds.ExcelWriter(out_path, engine='openpyxl') as book:
                frame.to_excel(book, index=False, float_format="%.3f")

        except Exception as err:
            mb.showerror("Error", f"Error converting file {src}: {err}")
            return

    mb.showinfo("Success", "All files have been converted successfully!")

export_to_xlsx()

Why this happens

pandas will happily infer dtypes if the file starts with clean headers and rows. For instance, with a plain CSV like this, the values are numeric in Excel and ready to graph:

Frequency,Random
5.0e+3,6.01e-4
5.0e+3,6.01e-4
5.0e+3,6.01e-4
5.0e+3,6.01e-4

But if the file begins with arbitrary metadata before the header, pandas reads those cells as part of the table and cannot confidently infer numeric types. Consider this structure:

!Keysignth,Easdf,MYA1234,A.04.00
!Date: Thur

BEGIN CH1_DATA
Frequency,Random
5.0e+3,6.01e-4
5.0e+3,6.01e-4
5.0e+3,6.01e-4
5.0e+3,6.01e-4

Here’s the kicker: using pds.to_numeric(..., errors='ignore') does not “convert what you can and skip the rest”. If any value in a column fails to parse, the entire conversion for that column is skipped, silently. Combined with those top lines, you end up with object-typed columns that get written as text. The numbers then look like scientific notation strings, not true numerics.

There is also a practical note about representation: numbers in memory don’t have a “scientific” or “regular” format. That is a display concern when reading or writing. The only thing that matters for plotting is whether the cells are numbers or text.

Fix 1: point pandas at the real header

If the header always starts after a fixed number of lines, tell pandas where the actual header row lives. That way it ignores everything above it when building the DataFrame, and numeric inference works as expected.

import pandas as pds
import os
from tkinter import filedialog as fd, messagebox as mb

def export_to_xlsx_fixed_header():
    picked = fd.askopenfilenames(filetypes=[("Text and CSV files", "*.txt *.csv")])
    if not picked:
        return

    for src in picked:
        try:
            delim = '\t' if src.lower().endswith('.txt') else ','
            frame = pds.read_csv(src, sep=delim, header=3)  # headers are on line index 3

            out_path = os.path.splitext(src)[0] + '.xlsx'
            with pds.ExcelWriter(out_path, engine='openpyxl') as book:
                frame.to_excel(book, index=False)  # no float_format

        except Exception as err:
            mb.showerror("Error", f"Error converting file {src}: {err}")
            return

    mb.showinfo("Success", "All files have been converted successfully!")

This preserves numeric types but drops the metadata lines from the Excel file, since read_csv ignores everything before the header.

Fix 2: keep metadata and convert only data rows

If you want those top lines in the Excel output, load the file normally, then convert just the data region to numeric types. Skipping the metadata rows avoids the column-wide failure from earlier.

import pandas as pds
import os
from tkinter import filedialog as fd, messagebox as mb

def export_to_xlsx_keep_meta():
    picked = fd.askopenfilenames(filetypes=[("Text and CSV files", "*.txt *.csv")])
    if not picked:
        return

    for src in picked:
        try:
            delim = '\t' if src.lower().endswith('.txt') else ','
            frame = pds.read_csv(src, sep=delim)  # do not skip metadata

            for field in frame.columns:
                frame.loc[3:, field] = pds.to_numeric(frame[field][3:])  # convert from row index 3

            out_path = os.path.splitext(src)[0] + '.xlsx'
            with pds.ExcelWriter(out_path, engine='openpyxl') as book:
                frame.to_excel(book, index=False)  # no float_format

        except Exception as err:
            mb.showerror("Error", f"Error converting file {src}: {err}")
            return

    mb.showinfo("Success", "All files have been converted successfully!")

With this, the metadata stays, and the numeric block becomes true numbers suitable for plotting.

Fix 3: when the metadata height isn’t constant

If the number of metadata lines varies, locate the first numeric row programmatically and then convert from there. One simple heuristic is to scan the first column until a numeric value is found.

# after reading frame with pds.read_csv(...):
first_col = frame.columns[0]
for row_idx, cell in enumerate(frame[first_col]):
    try:
        pds.to_numeric(cell)
        start_idx = row_idx
        break
    except ValueError:
        pass
else:
    print("Warning: no numeric data in the first column")
    start_idx = row_idx + 1

Then apply numeric conversion from that row for every column.

for field in frame.columns:
    frame.loc[start_idx:, field] = pds.to_numeric(frame[field][start_idx:])

This approach assumes that all columns switch to numeric at the same row and that nothing non-numeric appears after that point.

About float_format

Passing float_format="%.3f" to to_excel forces rounding before writing. That can hide distinctions like 6e-4 versus 6.12e-4 by turning them into the same displayed value when rounded. Once your data are truly numeric, let Excel handle number formats as needed.

Why this matters

Charts, aggregations, and formulas in Excel rely on actual numeric types. If data are stored as strings, plots either fail or silently produce wrong outputs. Getting the read step right—by pointing to the real header or converting only the data region—removes the ambiguity and preserves numeric integrity.

Wrap-up

The symptoms looked like a formatting problem, but the root cause was mixed content before the header combined with a permissive conversion flag. The reliable path is to either instruct pandas where the header starts or convert only the rows that comprise the dataset. Avoid pre-rounding during export. With those adjustments, scientific notation parses as proper numbers and behaves predictably in Excel and in charts.

The article is based on a question from StackOverflow by julien1h and an answer by joanis.

csv excel python xlsx