2025, Dec 17 19:00
Downcasting float64 to float32 in Polars: understanding extra digits, precision loss, and display formatting in Python
Learn why Polars shows extra digits after downcasting float64 to float32. We explain floating-point precision, Python's printing, and how to manage display vs accuracy.
Downcasting floating-point columns often surprises even experienced engineers: a clean number becomes a slightly messy one with “extra” digits. In Polars this shows up when converting a column from float64 to float32. The underlying values still compare as expected, but the printed output looks off. Let’s unpack why this happens, what exactly you are seeing, and how to reason about it.
Reproducing the display surprise
The following minimal example creates a Polars DataFrame, checks a difference and an equality, then downcasts to float32 and inspects the display. The calculations still behave, but the numbers suddenly show more decimals.
import polars as pl
frame_a = pl.DataFrame({
"metric": [2021.9952, 2024.0, 2024.25, 2024.456]
})
print("original values")
print(frame_a)
print("Original: diff row1 - row0")
delta_a = frame_a["metric"][1] - frame_a["metric"][0]
print(delta_a)
print("Original value equality")
print(frame_a["metric"][0] == 2021.9952)
print("Downcasting to float32")
print(frame_a.cast({"metric": pl.Float32}))
print("Downcasted: diff row1 - row0")
delta_b = frame_a["metric"][1] - frame_a["metric"][0]
print(delta_b)
print("Downcasted value equality")
print(frame_a["metric"][0] == 2021.9952)
The downcasted display now shows values like 2021.995239 and 2024.456055 instead of the “neater” 2021.9952 and 2024.456. Visually it looks like random numbers have been tacked on. They haven’t.
What’s really happening
This behavior is an artifact of how floating-point numbers are represented and printed. Python itself has no float32 type—only float64. When you downcast in Polars, the data becomes float32. But when you print it in Python, those float32 values are promoted back to float64 for display. That promotion preserves the float32 bits exactly, and Python’s float-to-string conversion then shows the closest decimal representation of that float64 value. The result is a faithful depiction of the float32 number you produced, not the original float64 you started with.
Precision is the key. A float64 carries 53 bits of precision. A float32 carries only 24. When you downcast, the last 29 bits are rounded away. The display you see is the closest binary32 (float32) number, shown via a binary64 (float64) print path.
To make this concrete, inspect the same value as a float64 and as a float32 using Python’s struct, which exposes the raw representation without any DataFrame library involved:
import struct
x64 = 2021.9952
print(x64)
print(x64.hex()) # 53 bits of precision in float64
# Convert to float32 and back to float64 without changing bits beyond float32
packed32 = struct.pack('f', x64)
x32_as_64 = struct.unpack('f', packed32)[0]
print(x32_as_64)
print(x32_as_64.hex()) # shows the rounded float32 value
The float64 form of 2021.9952 is:
0x1.f97fb15b573ebp+10
After converting to float32 and back, you get:
2021.9952392578125
0x1.f97fb20000000p+10
Notice how the hex form of the rounded value ends with zeros. Those lost bits are gone for good after the downcast. The number you see printed is therefore the exact closest float32 representation, not padding.
There is another angle that trips people up. Numbers like 2024.456 look tidy in base 10, but floating-point stores numbers as sums of powers of 2, not 10. That decimal cannot be represented exactly by either float64 or float32. With float64 you simply have enough bits that the default formatting hides the approximation. As soon as you show more digits, the truth peeks through. For example:
"%.16f" % 2024.456 # -> '2024.4559999999999036'
Why the display changes after downcasting
Downcasting reduces precision from 53 bits to 24 bits. When the result is displayed, Python promotes the float32 back to float64 to print it, but that promotion preserves only what was present in float32. The printer then shows the closest decimal for that promoted value. That’s why you see 2021.995239 instead of 2021.9952, and 2024.456055 instead of 2024.456. They are the correct closest float32 numbers.
The digits you see after downcasting are the actual closest float32 representation of your number, not padding. This is normal and expected due to the limitations of the float32 type. If you want visually consistent output, limit display precision; the underlying numbers will still differ because of the loss of precision.
A precise demonstration in code
To see the rounding boundary explicitly, here’s a self-contained snippet that contrasts the original float64 bits with the rounded float32 bits and underscores why the display is different after downcasting:
import struct
x64 = 2021.9952
print("float64 value:", x64)
print("float64 hex:", x64.hex())
# Round to float32 then re-promote to float64 for printing
p32 = struct.pack('f', x64)
x32_promoted = struct.unpack('f', p32)[0]
print("float32 as printed by Python (promoted to float64):", x32_promoted)
print("float32-promoted hex:", x32_promoted.hex())
This shows exactly how the bits change and why the printed decimal changes accordingly. It is not specific to Polars; any system that downcasts to float32 and then displays via Python’s float will behave the same way.
Why this matters for engineering work
Understanding the float32 versus float64 distinction is crucial for debugging, data validation, and pipeline audits. Without that context, it’s easy to misinterpret the printout as data corruption. In reality the underlying math is consistent: comparisons and differences against the original data can still evaluate as expected when computed at the same precision, but the string representation exposes the lower precision after downcasting.
When visual parity with source files matters, remember that CSV holds decimal text while computation uses binary floats. A number that looks tidy in a text editor may never be stored exactly as-is in binary form. After downcasting, the visible approximation simply becomes more apparent.
Practical takeaways and closing notes
If your goal is human-friendly inspection, configure your display to limit precision so that routine prints don’t surface binary artifacts. If your goal is numeric fidelity, keep sensitive columns at float64, or be explicit about the precision tradeoff when downcasting. Either way, the printed “extra digits” are not random; they are the correct nearest float32 values promoted for display. Knowing this helps you differentiate cosmetic differences from real data issues and keeps reviews focused on signal rather than formatting noise.