2025, Sep 29 05:00

Resolve Databricks Delta INSERT ... SELECT column arity mismatches over PYODBC with UseNativeQuery=1

Troubleshooting Databricks Delta inserts over PYODBC: fix INSERT ... SELECT column arity mismatches by enabling UseNativeQuery=1 in the ODBC driver settings.

When inserting into a Delta table on Databricks through PYODBC, you might see an error that doesn’t appear in the Databricks UI. A common case: a partial-column INSERT works in the workspace, but the same SQL fails over ODBC with a column arity mismatch. The result is confusing symptoms like values landing in the wrong columns or an error about mismatched column counts.

Reproducing the issue

The following direct VALUES insert works both in Databricks and via Python:

INSERT INTO etl_delta_sink (INGESTED_AT, SRC_FILE_DATE)
VALUES (CURRENT_TIMESTAMP, 20250827);

However, the SELECT-based variant runs fine in Databricks but fails through PYODBC, even though only two target columns are specified:

INSERT INTO etl_delta_sink (INGESTED_AT, SRC_FILE_DATE)
SELECT CURRENT_TIMESTAMP, 20250827;

Over PYODBC, this can produce a Delta error like:

[DELTA_INSERT_COLUMN_ARITY_MISMATCH] target table has 23 column(s) but the inserted data has 2 column(s)

In another scenario, when the table physically lists the columns in an order that doesn’t match the statement, the values can end up placed incorrectly via PYODBC, while the same statement in Databricks inserts into the intended columns.

Why this happens

The behavior difference stems from SQL dialect translation on the ODBC path. The statement that Databricks accepts as-is in the UI can be transformed by the client/driver in a way that changes how the engine interprets column binding, leading to an apparent arity mismatch or column order confusion.

The fix

Force native SQL execution by adding UseNativeQuery=1 to the ODBC connection string. This avoids translation and preserves the intended INSERT semantics. See the Databricks ODBC capabilities documentation for details: Databricks ODBC docs.

Example connection and execution with PYODBC:

import pyodbc

conn_str = (
    "Driver={Databricks ODBC Driver};"
    "Host=YOUR_HOST;"
    "Port=YOUR_PORT;"
    "HTTPPath=YOUR_HTTP_PATH;"
    "UseNativeQuery=1;"
)

cn = pyodbc.connect(conn_str)
cur = cn.cursor()

sql_ins = (
    "INSERT INTO etl_delta_sink (INGESTED_AT, SRC_FILE_DATE) "
    "SELECT CURRENT_TIMESTAMP, 20250827"
)

cur.execute(sql_ins)
cn.commit()

With native query execution enabled, the INSERT ... SELECT form aligns with the behavior you see in the Databricks UI.

A note on parameterized statements

After enabling UseNativeQuery=1, a parameterized INSERT of the form below has been observed to raise an error over PYODBC:

INSERT INTO etl_delta_sink (SRC_FILE_DATE, INGESTED_AT)
VALUES (?, CURRENT_TIMESTAMP)

In that setup, one report showed the driver returning “Expected 0 parameters, supplied 1” (HY000). Removing UseNativeQuery made that particular parameterized INSERT work again. The underlying issue was confirmed and, once addressed, the code ran correctly. If you hit a similar message, compare behavior with and without UseNativeQuery and validate how parameters are bound in your environment.

Why this matters

Client-side SQL translation can subtly change how your statements are interpreted by Databricks. That can manifest as Delta arity mismatches or values being mapped to the wrong columns even when your SQL explicitly lists the targets. Knowing how to enforce native SQL on the ODBC path gives you control and consistency across tools.

Takeaways

If an INSERT that targets a subset of columns works in the Databricks UI but fails through PYODBC with a [DELTA_INSERT_COLUMN_ARITY_MISMATCH], enable UseNativeQuery=1 in the ODBC connection string to avoid dialect translation. After flipping that switch, re-test both literal and parameterized statements. If you encounter “Expected 0 parameters, supplied 1 (HY000)” with parameter binding, compare behavior with the flag toggled and align your connection settings accordingly. This keeps your Databricks SQL behavior predictable across clients and protects you from silent column misalignment.

The article is based on a question from StackOverflow by Jason Wiest and an answer by Kertis van Kertis.

databricks python