https://pytroubles.com/en/posts/id2290-pandas-read-csv-keep-strings-and-ids-intact-parse-one-datetime-with-dtype-string-and-parse-dates

Pandas read_csv: keep strings and IDs intact, parse one datetime with dtype='string' and parse_dates

Safely load CSV into pandas: keep most columns as strings, parse the send_date column as proper datetime

Pandas read_csv: keep strings and IDs intact, parse one datetime with dtype='string' and parse_dates

Learn how to use pandas read_csv with dtype='string' and parse_dates to preserve IDs and flags as strings while correctly parsing send_date as datetime64[ns].

2025-12-05T17:00:10+03:00

When you ingest CSV data into pandas, the default type inference can backfire. Numeric-looking identifiers lose leading zeros, and string-valued flags turn into booleans. At the same time, you may genuinely want a specific column parsed as a datetime. The goal is simple: keep almost everything as strings, except for one column that should be a proper datetime.Code example that shows the problemThe following snippet loads a CSV and inspects the first row alongside the dtypes. It demonstrates how default inference and various dtype options affect the result.import pandas def show_sample_types(frame: pandas.DataFrame): first = frame.iloc[0] for col in frame.columns: print(f"{col}: {first[col]} (type: {frame[col].dtype})") csv_path = "test.csv" sample = pandas.read_csv(csv_path, parse_dates=["send_date"]) show_sample_types(sample) sample = pandas.read_csv(csv_path, dtype=object, parse_dates=["send_date"]) show_sample_types(sample) sample = pandas.read_csv(csv_path, dtype=str, parse_dates=["send_date"]) show_sample_types(sample) kinds = {"name": "object", "zip_code": "object", "send_date": "datetime64", "is_customer": "object"} sample = pandas.read_csv(csv_path, dtype=kinds, parse_dates=["send_date"]) # TypeError: the dtype datetime64 is not supported for parsing, pass this column using parse_dates instead What’s really happening and whyLeaving dtype unspecified triggers pandas’ automatic casting. That turns a zip code like 04321 into an integer, dropping the leading zero, and converts string values true or false into booleans. Switching to dtype=object preserves those fields as string-like values but blocks send_date from becoming datetime64[ns], because object tells pandas to keep the raw Python objects. Trying to force datetime via dtype={"send_date": "datetime64"} raises an explicit error instructing to use parse_dates for date parsing. Finally, using dtype=str together with parse_dates ends up materializing send_date as an integer-like timestamp string rather than a datetime, again defeating the purpose.The fixThe effective combination is to request pandas’ dedicated string dtype for all columns while simultaneously letting parse_dates handle the one date column. This preserves identifiers and string flags as strings, and correctly parses the date column as datetime64[ns].import pandas result = pandas.read_csv("test.csv", dtype="string", parse_dates=["send_date"]) print(result.dtypes) # name string[python] # zip_code string[python] # send_date datetime64[ns] # is_customer string[python] # dtype: object print(result) # name zip_code send_date is_customer # 0 Madeline 04321 2025-04-13 true # 1 Theo 32255 2025-04-08 true # 2 Granny 84564 2025-04-15 false If any of the values in the date field can’t be parsed as a date, the column’s dtype will be read as object instead of datetime64[ns].Why this mattersData pipelines often rely on stable, lossless loading. Postal codes must keep leading zeros, categorical flags may be intentionally string-valued, and downstream joins or validations expect consistent types. Selective parsing avoids silent conversions while still delivering a proper datetime for time-based operations.TakeawayUse pandas.read_csv with dtype="string" for broad type preservation and parse_dates=["send_date"] for the one column that should be datetime. Avoid forcing datetime via dtype, and be aware that unparseable date values will cause the date column to come in as object.

pandas read_csv, dtype='string', parse_dates, preserve leading zeros, keep strings, CSV import, datetime64[ns], send_date column, prevent type inference, zip code identifiers, booleans as strings

2025

2025, Dec 05 17:00

Safely load CSV into pandas: keep most columns as strings, parse the send_date column as proper datetime

Learn how to use pandas read_csv with dtype='string' and parse_dates to preserve IDs and flags as strings while correctly parsing send_date as datetime64[ns].

Code example that shows the problem

The following snippet loads a CSV and inspects the first row alongside the dtypes. It demonstrates how default inference and various dtype options affect the result.

import pandas
def show_sample_types(frame: pandas.DataFrame):
    first = frame.iloc[0]
    for col in frame.columns:
        print(f"{col}: {first[col]} (type: {frame[col].dtype})")
csv_path = "test.csv"
sample = pandas.read_csv(csv_path, parse_dates=["send_date"])  
show_sample_types(sample)
sample = pandas.read_csv(csv_path, dtype=object, parse_dates=["send_date"])  
show_sample_types(sample)
sample = pandas.read_csv(csv_path, dtype=str, parse_dates=["send_date"])  
show_sample_types(sample)
kinds = {"name": "object", "zip_code": "object", "send_date": "datetime64", "is_customer": "object"}
sample = pandas.read_csv(csv_path, dtype=kinds, parse_dates=["send_date"])  # TypeError: the dtype datetime64 is not supported for parsing, pass this column using parse_dates instead

What’s really happening and why

Leaving dtype unspecified triggers pandas’ automatic casting. That turns a zip code like 04321 into an integer, dropping the leading zero, and converts string values true or false into booleans. Switching to dtype=object preserves those fields as string-like values but blocks send_date from becoming datetime64[ns], because object tells pandas to keep the raw Python objects. Trying to force datetime via dtype={"send_date": "datetime64"} raises an explicit error instructing to use parse_dates for date parsing. Finally, using dtype=str together with parse_dates ends up materializing send_date as an integer-like timestamp string rather than a datetime, again defeating the purpose.

The fix

The effective combination is to request pandas’ dedicated string dtype for all columns while simultaneously letting parse_dates handle the one date column. This preserves identifiers and string flags as strings, and correctly parses the date column as datetime64[ns].

import pandas
result = pandas.read_csv("test.csv", dtype="string", parse_dates=["send_date"])  
print(result.dtypes)
# name           string[python]
# zip_code       string[python]
# send_date      datetime64[ns]
# is_customer    string[python]
# dtype: object
print(result)
#        name zip_code  send_date is_customer
# 0  Madeline    04321 2025-04-13        true
# 1      Theo    32255 2025-04-08        true
# 2    Granny    84564 2025-04-15       false

If any of the values in the date field can’t be parsed as a date, the column’s dtype will be read as object instead of datetime64[ns].

Why this matters

Data pipelines often rely on stable, lossless loading. Postal codes must keep leading zeros, categorical flags may be intentionally string-valued, and downstream joins or validations expect consistent types. Selective parsing avoids silent conversions while still delivering a proper datetime for time-based operations.

Takeaway

Use pandas.read_csv with dtype="string" for broad type preservation and parse_dates=["send_date"] for the one column that should be datetime. Avoid forcing datetime via dtype, and be aware that unparseable date values will cause the date column to come in as object.

dataframe dtype pandas python python-3.x