2026, Jan 09 09:00
Schema-Driven Casting in Polars: Turn Decimal Columns with Scale 0 into Int64, Others into Float64
Learn how to cast Polars Decimal columns by scale after read_database(): scale 0 to Int64, above zero to Float64. Schema-driven, reliable; no manual lists.
When loading data with polars.read_database(), you can end up with Decimal columns. Sometimes that is desired, but often you want a simple Int or Float instead, driven by the Decimal scale: scale == 0 should become Int64, anything above zero should become Float64. The challenge is doing this reliably without hand-curating column lists.
The tempting but non-working idea
It’s natural to think about selectors and conditional casts, something along these lines:
data_frame.with_columns(
pl.selectors.decimal(scale="1+").cast(pl.Float64()),
pl.selectors.decimal(scale="0").cast(pl.Int64())
)
This would be neat, but pl.selectors.decimal() does not accept arguments, so you can’t filter Decimal columns by scale this way. You need access to the inferred schema to drive the casts.
What’s actually going on
Polars assigns Decimal dtypes during schema inference when reading from the database. The dtype for each column carries metadata, including scale. The key is that you can read that scale right off the dtype in the DataFrame schema, and that gives you exactly the information required to split Decimal columns into the “integer-like” group and the “fractional” group.
A clear, explicit fix
The most direct way is to inspect the schema, build two column lists based on dtype.scale, and then cast them in one pass. Below is a compact pattern that does just that.
int_like_names = [nm for nm, kind in data_frame.schema.items()
if isinstance(kind, pl.Decimal) and kind.scale == 0]
flt_like_names = [nm for nm, kind in data_frame.schema.items()
if isinstance(kind, pl.Decimal) and kind.scale > 0]
data_frame = data_frame.with_columns(
pl.col(int_like_names).cast(pl.Int64),
pl.col(flt_like_names).cast(pl.Float64),
)
The crux is accessing kind.scale from the schema and using it to partition the Decimal columns. Once split, a single with_columns call applies the respective casts.
Why this detail matters
Schema-driven casting lets you preserve numeric intent without manual curation. For analytics, reporting, or feature engineering, silently keeping Decimal where an integer or float is expected can ripple into downstream confusion or unnecessary precision handling. Making the cast deterministic based on scale keeps the data tidy and consistent with how it is logically used.
Conclusion
If selectors can’t express the filter you need, let the schema guide you. Read the dtype metadata, split the Decimal columns by scale, and cast in one shot. It’s explicit, fast to reason about, and resilient to schema changes in the source database.