2025, Nov 02 13:00

How to Load Mixed-Type SQLite Columns into Polars: Diagnose the i64 vs binary Error and Fix with CAST

Resolve Polars-SQLite SchemaError failed to determine supertype of i64 and binary. Learn why mixed types break loads and fix with typeof() audits, CAST in SQL.

Loading a large SQLite table into Polars can blow up with a SchemaError that looks puzzling at first glance: “failed to determine supertype of i64 and binary.” If you are pulling hundreds of columns and the error doesn’t tell you which one is to blame, debugging can be tedious. The root cause, however, is simple and very SQLite-specific, and there is a repeatable way to fix it at the query layer without changing your database schema.

Reproducing the issue

The following snippet attempts to read an entire table into a Polars DataFrame and triggers the error in environments where a column mixes numeric and blob values:

import sqlite3
import polars as pl

handle = sqlite3.connect("my_database.db")
frame_in = pl.read_database(
    connection=handle,
    query="SELECT * FROM table_to_load",
    infer_schema_length=None,
)
handle.close()

What actually goes wrong

SQLite is permissive about types: a column declared as INTEGER can still hold values of different runtime kinds. Polars is strict and expects a single concrete dtype per column. When a column contains values that map to incompatible Polars types with no defined supertype, ingestion fails. That’s exactly what “failed to determine supertype of i64 and binary” means: some values look like 64-bit integers while others are binary blobs, and there is no common supertype Polars can promote to automatically.

This discrepancy shows up in practical ways. In one real case, inserting a mix of Python int and numpy.int64 into an INTEGER column resulted in some rows becoming BLOBs in SQLite; converting numpy.int64 to a native int via .item() before insertion eliminated the error. More generally, you can verify whether a column is homogeneous by asking SQLite to report the runtime types present with a query such as SELECT count(), typeof(column) FROM table_name GROUP BY typeof(column). If it returns more than one typeof per column, Polars will not accept that column as-is.

The most reliable workaround: cast at the SQL boundary

Since the store-of-truth is SQLite, the safest resolution is to cast problematic columns in the SQL query so Polars sees a consistent type. Two small tactics make this manageable at scale. First, pl.read_database_uri typically includes the offending column name in the error message, which is more actionable than using a connection object. Second, you can wrap the query in a retry function that inspects the error, identifies the column, and reissues the query with an explicit CAST for that column. This approach trades one-time strictness for a deterministic cast policy you control.

import polars as pl

def run_with_casts(sql_text: str, dsn: str, fallback_sqlite_type: str = "text") -> pl.DataFrame:
    coerced_fields: set[str] | None = None
    column_sequence: list[str] | None = None
    from_tail: str | None = None
    base_table: str | None = None
    err_re = None

    while True:
        try:
            result_df = pl.read_database_uri(sql_text, dsn)
            if coerced_fields is not None:
                import warnings
                warn_msg = f"Your query had type errors and was changed to\n{sql_text}"
                warnings.warn(warn_msg)
            return result_df
        except Exception as exc:
            import re

            if err_re is None:
                err_re = re.compile(r"(?<=name:\s)\w+")
            match = err_re.search(str(exc))
            if match is None:
                raise
            bad_col = match.group()

            if (
                coerced_fields is None
                or column_sequence is None
                or base_table is None
                or from_tail is None
            ):
                lower = sql_text.lower()
                after_select = lower.split("select", maxsplit=1)[1]
                select_list, from_tail = after_select.split("from", maxsplit=1)
                sel_cols = select_list.strip()
                base_table = from_tail.strip().replace("\n", " ").split(" ")[0]

                if sel_cols == "*":
                    import sqlite3
                    with sqlite3.connect(dsn.replace("sqlite:///", "")) as cxn:
                        cr = cxn.cursor()
                        cr.execute(f"PRAGMA table_info({base_table})")
                        meta = cr.fetchall()
                        column_sequence = [row[1] for row in meta]
                        coerced_fields = set()
                else:
                    column_sequence = [x.strip() for x in sel_cols.split(",")]
                    coerced_fields = set()

            if bad_col in coerced_fields or bad_col not in column_sequence:
                raise

            coerced_fields.add(bad_col)
            projection = ", ".join(
                [
                    f"cast({col} as {fallback_sqlite_type}) as {col}" if col in coerced_fields else col
                    for col in column_sequence
                ]
            )
            sql_text = f"select {projection} from {from_tail}"

This focuses on simple SELECT statements and does not perform full SQL parsing. It aims to be practical when you are selecting either * or a comma-separated list of columns from a single table.

End-to-end demonstration

The following setup produces a mixed-type column in SQLite and then applies the retry strategy to coerce the column and successfully load into Polars:

import os
import sqlite3
import polars as pl

# Build a tiny database with a BLOB in an INTEGER column
fn = "mydb.db"
with sqlite3.connect(fn) as cx:
    cur = cx.cursor()
    cur.execute(
        """
        CREATE TABLE IF NOT EXISTS users (
            name TEXT,
            age INTEGER
        )
        """
    )
    cur.executemany(
        "INSERT INTO users (name, age) VALUES (?, ?)",
        [("Alice", 30), ("Bob", 25), ("Charlie", b"jfjf")],
    )

dsn = f"sqlite:///{os.path.abspath(fn)}"

# This will raise because the age column mixes INTEGER and BLOB
sql_stmt = "select name, age from users"
# pl.read_database_uri(sql_stmt, dsn)

# Now retry with automatic CAST to a fallback SQLite type
out = run_with_casts(sql_stmt, dsn)
print(out)

When the read fails, pl.read_database_uri typically reports the column name in the error message, for example identifying age as the failing column. The retry function then rebuilds the query with CAST(age AS text) and returns a DataFrame, emitting a warning that shows the adjusted SQL.

Finding problematic columns up front

If you prefer to audit the dataset before loading, SQLite can summarize runtime types per column. The pattern is straightforward and can be run for each column you care about:

SELECT count(), typeof(age)
FROM users
GROUP BY typeof(age);

The goal is to see a single typeof per column; multiple typeof values point to places where explicit casting in the query, or normalizing values before insertion, is necessary. As one concrete fix, converting numpy.int64 values to native Python int with .item() before inserting them prevents those values from turning into BLOBs.

Why this matters

Polars is unapologetically strict about column dtypes, which is a feature for performance and correctness, but it means SQLite’s laissez-faire typing can surface as runtime errors at data ingestion time. Knowing how to identify and coerce mixed-type columns gives you deterministic and debuggable pipelines. It also keeps type issues out of your analytical layer and squarely in the boundary between the source system and the DataFrame engine, where a CAST is cheap and visible.

Takeaways

If a Polars load from SQLite fails with “failed to determine supertype of i64 and binary,” your table likely contains mixed runtime types in one or more columns. Use pl.read_database_uri to get an error that names the column. Cast that column in the SQL, or automate the retry to do so for you, accepting a stable fallback like TEXT. When possible, prevent the issue at the source by inserting homogeneous values into SQLite, for example by converting numpy.int64 to int before writing, and audit columns with SELECT count(), typeof(col) ... GROUP BY typeof(col). With these steps, you can keep Polars happy without restructuring your database.

The article is based on a question from StackOverflow by SapereAude and an answer by Dean MacGregor.

dataframe polars python python-polars sqlite