2025, Dec 05 17:00

Safely load CSV into pandas: keep most columns as strings, parse the send_date column as proper datetime

Learn how to use pandas read_csv with dtype='string' and parse_dates to preserve IDs and flags as strings while correctly parsing send_date as datetime64[ns].

When you ingest CSV data into pandas, the default type inference can backfire. Numeric-looking identifiers lose leading zeros, and string-valued flags turn into booleans. At the same time, you may genuinely want a specific column parsed as a datetime. The goal is simple: keep almost everything as strings, except for one column that should be a proper datetime.

Code example that shows the problem

The following snippet loads a CSV and inspects the first row alongside the dtypes. It demonstrates how default inference and various dtype options affect the result.

import pandas
def show_sample_types(frame: pandas.DataFrame):
    first = frame.iloc[0]
    for col in frame.columns:
        print(f"{col}: {first[col]} (type: {frame[col].dtype})")
csv_path = "test.csv"
sample = pandas.read_csv(csv_path, parse_dates=["send_date"])  
show_sample_types(sample)
sample = pandas.read_csv(csv_path, dtype=object, parse_dates=["send_date"])  
show_sample_types(sample)
sample = pandas.read_csv(csv_path, dtype=str, parse_dates=["send_date"])  
show_sample_types(sample)
kinds = {"name": "object", "zip_code": "object", "send_date": "datetime64", "is_customer": "object"}
sample = pandas.read_csv(csv_path, dtype=kinds, parse_dates=["send_date"])  # TypeError: the dtype datetime64 is not supported for parsing, pass this column using parse_dates instead

What’s really happening and why

Leaving dtype unspecified triggers pandas’ automatic casting. That turns a zip code like 04321 into an integer, dropping the leading zero, and converts string values true or false into booleans. Switching to dtype=object preserves those fields as string-like values but blocks send_date from becoming datetime64[ns], because object tells pandas to keep the raw Python objects. Trying to force datetime via dtype={"send_date": "datetime64"} raises an explicit error instructing to use parse_dates for date parsing. Finally, using dtype=str together with parse_dates ends up materializing send_date as an integer-like timestamp string rather than a datetime, again defeating the purpose.

The fix

The effective combination is to request pandas’ dedicated string dtype for all columns while simultaneously letting parse_dates handle the one date column. This preserves identifiers and string flags as strings, and correctly parses the date column as datetime64[ns].

import pandas
result = pandas.read_csv("test.csv", dtype="string", parse_dates=["send_date"])  
print(result.dtypes)
# name           string[python]
# zip_code       string[python]
# send_date      datetime64[ns]
# is_customer    string[python]
# dtype: object
print(result)
#        name zip_code  send_date is_customer
# 0  Madeline    04321 2025-04-13        true
# 1      Theo    32255 2025-04-08        true
# 2    Granny    84564 2025-04-15       false

If any of the values in the date field can’t be parsed as a date, the column’s dtype will be read as object instead of datetime64[ns].

Why this matters

Data pipelines often rely on stable, lossless loading. Postal codes must keep leading zeros, categorical flags may be intentionally string-valued, and downstream joins or validations expect consistent types. Selective parsing avoids silent conversions while still delivering a proper datetime for time-based operations.

Takeaway

Use pandas.read_csv with dtype="string" for broad type preservation and parse_dates=["send_date"] for the one column that should be datetime. Avoid forcing datetime via dtype, and be aware that unparseable date values will cause the date column to come in as object.