2025, Oct 02 17:00

How to fix pandas to_datetime ValueError when parsing Unix timestamp milliseconds: use unit='ms' and extract dates

Learn why pandas to_datetime fails on Unix timestamp milliseconds and how to fix it with unit='ms'. Convert numeric timestamps to datetime and extract dates.

When working with time data in pandas, a common pitfall is trying to parse numeric Unix timestamps as if they were human-readable strings. If your column contains values like 1743004800000 and you feed them into to_datetime with a date format such as '%Y-%m-%d', you'll run straight into a parsing error.

Problem overview

Consider a dataset where the time field looks like 1743004800000. It might visually resemble a string, but semantically it's a Unix timestamp in milliseconds. Attempting to parse it with a string date format leads to an exception:

ValueError: time data "1743004800000" doesn't match format "%Y-%m-%d"

Reproducing the issue

The snippet below sets up a minimal example and demonstrates the failing conversion attempt. The identifiers are generic, but the logic mirrors the real-world scenario.

import pandas as pds

records = pds.DataFrame({
    "event_ts": [1743004800000, 1753004800000]
})

# Incorrect: trying to parse milliseconds with a string date format
broken = records.copy()
broken["event_ts"] = pds.to_datetime(broken["event_ts"], format="%Y-%m-%d").dt.date

This raises the error shown above because 1743004800000 is not a '%Y-%m-%d' string; it's a numeric timestamp.

Why it happens

The core of the problem is a type and semantics mismatch. Values like 1743004800000 represent time as the number of milliseconds since the Unix epoch, not as a formatted date string. The format argument tells pandas to parse string patterns like '2025-03-26', but here the input is a millisecond count, so the parser cannot match it to '%Y-%m-%d'.

The fix

The correct approach is to instruct pandas that the input is in milliseconds using the unit parameter. After conversion, if only the calendar date is needed, extract it via .dt.date.

import pandas as pds

records = pds.DataFrame({
    "event_ts": [1743004800000, 1753004800000]
})

# Correct: tell pandas the timestamps are in milliseconds
clean = records.copy()
clean["event_ts"] = pds.to_datetime(clean["event_ts"], unit="ms")

# If only the date is needed, create a separate column
clean["event_date"] = clean["event_ts"].dt.date

print(clean)

Result:

                event_ts  event_date
0 2025-03-26 16:00:00    2025-03-26
1 2025-07-20 09:46:40    2025-07-20

If your data uses a different time unit, adjust the unit accordingly.

Why this matters

Time handling bugs tend to be subtle yet costly. Misinterpreting a numeric timestamp as a formatted string either fails fast with a ValueError or, worse, introduces silent data corruption if coerced incorrectly. Being explicit about the unit keeps parsing deterministic, makes code self-documenting, and prevents downstream logic from operating on incorrect dates.

Takeaways

If your dataset stores time as large integers like 1743004800000, treat them as Unix timestamps in milliseconds and convert with unit="ms". Reserve the format argument for actual human-readable strings such as '2025-03-26'. After conversion, derive plain dates only when you truly need them, keeping the richer datetime for other operations.

The article is based on a question from StackOverflow by user824624 and an answer by Panda Kim.