https://pytroubles.com/en/posts/id1645-fix-pandas-timestamp-unit-mismatch-convert-fractional-millisecond-csv-offsets-to-proper-timedelta

Fix pandas timestamp unit mismatch: convert fractional millisecond CSV offsets to proper Timedelta

How to fix pandas timestamps when CSV offsets are fractional milliseconds (microsecond trap)

Fix pandas timestamp unit mismatch: convert fractional millisecond CSV offsets to proper Timedelta

Learn why pandas timestamps look identical when CSV offsets are fractional milliseconds, and how to scale values and use Timedelta to fix unit mismatch.

2025-11-03T01:00:09+03:00

When you build a timestamp column from a millisecond offset stored in a CSV, subtle unit mismatches can make the result look static even though it technically changes. A common scenario: the offsets are fractional milliseconds, and the code treats them as whole milliseconds. The output varies only in microseconds, which is easy to miss when eyeballing the values.Problem setupThe dataset contains a column of offsets, for example: 0, 0.005, 0.01, 0.015, 0.02, intended to be applied on top of the current time to create a time series. The initial approach constructs a base datetime from now() and adds a pandas Timedelta in milliseconds derived from that column.from datetime import datetime import pandas as pd # Example input data_map = { 'ms_delta': [0, 0.005, 0.01, 0.015, 0.02], 'col_x': [100, 101, 103, 104, 103], 'col_y': [200, 20.1, 20.1, 24.1, 40.1] } frame = pd.DataFrame(data_map) anchor = datetime.today() frame['ts_col'] = ( datetime( anchor.year, anchor.month, anchor.day, anchor.hour, anchor.minute, anchor.second, anchor.microsecond ) + pd.to_timedelta(frame['ms_delta'], unit='ms') ) Why the timestamps look identicalThe offsets are not whole milliseconds; they are fractional values like 0.005. Treating 0.005 with unit='ms' means “five thousandths of a millisecond”, i.e., 5 microseconds. The result is a sequence that differs in the trailing microseconds only. That difference is real, but it is small and easy to overlook when scanning the values. In other words, with 0.005 interpreted as milliseconds, your deltas end up in the microsecond range.If the goal is to apply whole milliseconds corresponding to the numeric values 0, 0.005, 0.01, 0.015, 0.02 as 0, 5, 10, 15, 20 milliseconds, the offsets must be scaled accordingly. In addition, initializing the base datetime with microseconds can make variations harder to read, since microseconds are already populated before adding the deltas.Correct approachConvert the fractional millisecond numbers into actual milliseconds by multiplying by 1000, and initialize the base datetime without microseconds. Then add the Timedelta in milliseconds. This preserves the intention: values like 0.005 become 5 ms instead of 5 µs.from datetime import datetime import pandas as pd # Sample data rows = { 'ms_delta': [0, 0.005, 0.01, 0.015, 0.02], 'col_x': [100, 101, 103, 104, 103], 'col_y': [200, 20.1, 20.1, 24.1, 40.1] } dataset = pd.DataFrame(rows) # Scale fractional ms to whole milliseconds dataset['ms_delta'] = dataset['ms_delta'] * 1000 now_ref = datetime.today() base_dt = datetime( now_ref.year, now_ref.month, now_ref.day, now_ref.hour, now_ref.minute, now_ref.second ) # Add millisecond Timedelta dataset['ts_col'] = base_dt + pd.to_timedelta(dataset['ms_delta'], unit='ms') Why this mattersTime arithmetic is unforgiving about units. A column that appears to be in milliseconds but actually encodes fractional milliseconds will produce microsecond-level shifts when passed directly to a millisecond Timedelta. This discrepancy changes how the series behaves and how it reads during debugging. Aligning the units with your intent avoids silent precision errors and reduces confusion when inspecting results.TakeawaysBe explicit about the unit semantics of your offset column and make them match the Timedelta unit you use. If the values represent fractional milliseconds but you want them to act as integral milliseconds, scale them to the intended unit first. Constructing the base timestamp without microseconds also helps make the progression in milliseconds immediately visible. With these adjustments, the time column will reflect the deltas as expected and be easier to validate.

pandas, timestamp, Timedelta, milliseconds, microseconds, fractional milliseconds, unit mismatch, CSV offsets, Python datetime, time series, pandas to_timedelta, timestamp column

2025

2025, Nov 03 01:00

How to fix pandas timestamps when CSV offsets are fractional milliseconds (microsecond trap)

Learn why pandas timestamps look identical when CSV offsets are fractional milliseconds, and how to scale values and use Timedelta to fix unit mismatch.

Problem setup

The dataset contains a column of offsets, for example: 0, 0.005, 0.01, 0.015, 0.02, intended to be applied on top of the current time to create a time series. The initial approach constructs a base datetime from now() and adds a pandas Timedelta in milliseconds derived from that column.

from datetime import datetime
import pandas as pd
# Example input
data_map = {
    'ms_delta': [0, 0.005, 0.01, 0.015, 0.02],
    'col_x': [100, 101, 103, 104, 103],
    'col_y': [200, 20.1, 20.1, 24.1, 40.1]
}
frame = pd.DataFrame(data_map)
anchor = datetime.today()
frame['ts_col'] = (
    datetime(
        anchor.year, anchor.month, anchor.day,
        anchor.hour, anchor.minute, anchor.second, anchor.microsecond
    )
    + pd.to_timedelta(frame['ms_delta'], unit='ms')
)

Why the timestamps look identical

The offsets are not whole milliseconds; they are fractional values like 0.005. Treating 0.005 with unit='ms' means “five thousandths of a millisecond”, i.e., 5 microseconds. The result is a sequence that differs in the trailing microseconds only. That difference is real, but it is small and easy to overlook when scanning the values. In other words, with 0.005 interpreted as milliseconds, your deltas end up in the microsecond range.

If the goal is to apply whole milliseconds corresponding to the numeric values 0, 0.005, 0.01, 0.015, 0.02 as 0, 5, 10, 15, 20 milliseconds, the offsets must be scaled accordingly. In addition, initializing the base datetime with microseconds can make variations harder to read, since microseconds are already populated before adding the deltas.

Correct approach

Convert the fractional millisecond numbers into actual milliseconds by multiplying by 1000, and initialize the base datetime without microseconds. Then add the Timedelta in milliseconds. This preserves the intention: values like 0.005 become 5 ms instead of 5 µs.

from datetime import datetime
import pandas as pd
# Sample data
rows = {
    'ms_delta': [0, 0.005, 0.01, 0.015, 0.02],
    'col_x': [100, 101, 103, 104, 103],
    'col_y': [200, 20.1, 20.1, 24.1, 40.1]
}
dataset = pd.DataFrame(rows)
# Scale fractional ms to whole milliseconds
dataset['ms_delta'] = dataset['ms_delta'] * 1000
now_ref = datetime.today()
base_dt = datetime(
    now_ref.year, now_ref.month, now_ref.day,
    now_ref.hour, now_ref.minute, now_ref.second
)
# Add millisecond Timedelta
dataset['ts_col'] = base_dt + pd.to_timedelta(dataset['ms_delta'], unit='ms')

Why this matters

Time arithmetic is unforgiving about units. A column that appears to be in milliseconds but actually encodes fractional milliseconds will produce microsecond-level shifts when passed directly to a millisecond Timedelta. This discrepancy changes how the series behaves and how it reads during debugging. Aligning the units with your intent avoids silent precision errors and reduces confusion when inspecting results.

Takeaways

Be explicit about the unit semantics of your offset column and make them match the Timedelta unit you use. If the values represent fractional milliseconds but you want them to act as integral milliseconds, scale them to the intended unit first. Constructing the base timestamp without microseconds also helps make the progression in milliseconds immediately visible. With these adjustments, the time column will reflect the deltas as expected and be easier to validate.

The article is based on a question from StackOverflow by SheCodes and an answer by Ranger.

datetime pandas python time-difference