2025, Oct 04 09:00

Parsing MySQL DDL with PARTITION BY RANGE (VALUES LESS THAN) fails in sqlglot 27.8.0: what to know

Learn why sqlglot 27.8.0 fails to parse MySQL DDL using PARTITION BY RANGE with VALUES LESS THAN, the errors to expect, and options or contributing a fix.

Parsing MySQL DDL with table partitioning can trip up general-purpose SQL parsers. If your script uses PARTITION BY RANGE with VALUES LESS THAN, sqlglot 27.8.0 will raise a parse error even though the DDL is valid for MySQL.

Reproducing the issue

The goal is to parse a schema file and extract table names. The following Python snippet reads a file, asks sqlglot to parse it with the MySQL dialect, and collects exp.Table nodes.

import logging
import sqlglot
from sqlglot import exp

# log setup
logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s [%(levelname)s] %(message)s"
)
log = logging.getLogger(__name__)


def collect_tables(path_to_sql, sql_flavor="mysql"):
    """
    Parse the SQL file and return a set of unique table names found.
    Logs errors if file not found or parsing fails.
    """
    try:
        with open(path_to_sql, "r") as fh:
            ddl_blob = fh.read()

        ast_list = sqlglot.parse(ddl_blob, dialect=sql_flavor)
        seen_tables = set()
        for node in ast_list:
            seen_tables.update([t.name for t in node.find_all(exp.Table)])
        return seen_tables
    except FileNotFoundError:
        log.error(f"File not found: {path_to_sql}")
        return set()
    except Exception as err:
        log.error(f"Error parsing `{path_to_sql}`: {err}")
        return set()

if __name__ == "__main__":
    input_sql = "changeLogs/health-service/create_db.sql"
    names = collect_tables(input_sql)

    log.info(f"Total unique tables found: {len(names)}")
    log.info(f"Table names: {sorted(list(names))}")

The schema includes partitioned tables using VALUES LESS THAN. Attempting to parse it produces errors like these:

An error occurred during parsing: Expecting ). Line 19, Col: 26.
  created_at`) USING BTREE
    ) PARTITION BY RANGE ( UNIX_TIMESTAMP(audit_ts)) (
    PARTITION p2401 VALUES LESS THAN (UNIX_TIMESTAMP('2024-02-01 00:00:00')),
    PARTITION p2402 VALUES LESS THAN (UNIX_TIMES

Here is the SQL that triggers the failure:

-- liquibase formatted sql

-- changeset debraj.manna@nexla.com:NEX-18235
CREATE TABLE IF NOT EXISTS `audit_control`
(
   `id`            BIGINT auto_increment NOT NULL,
   `message_id`    VARCHAR(100) DEFAULT NULL,
    `resource_type` VARCHAR(30) NOT NULL,
    `event_type`    VARCHAR(30) NOT NULL,
    `resource_id`   INT NOT NULL,
    `origin`        VARCHAR(100) NOT NULL,
    `created_at`    TIMESTAMP NOT NULL,
    `body`          mediumtext NOT NULL,
    `audit_ts`      TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    PRIMARY KEY (id, audit_ts),
    KEY `audit_control_resource_type_resource_id_IDX` (`resource_type`,`resource_id`) USING BTREE,
    KEY `audit_control_created_at_IDX` (`created_at`) USING BTREE
    ) PARTITION BY RANGE ( UNIX_TIMESTAMP(audit_ts)) (
    PARTITION p2401 VALUES LESS THAN (UNIX_TIMESTAMP('2024-02-01 00:00:00')),
    PARTITION p2402 VALUES LESS THAN (UNIX_TIMESTAMP('2024-03-01 00:00:00')),
    PARTITION p2403 VALUES LESS THAN (UNIX_TIMESTAMP('2024-04-01 00:00:00')),
    PARTITION p2404 VALUES LESS THAN (UNIX_TIMESTAMP('2024-05-01 00:00:00')),
    PARTITION p2405 VALUES LESS THAN (UNIX_TIMESTAMP('2024-06-01 00:00:00')),
    PARTITION p2406 VALUES LESS THAN (UNIX_TIMESTAMP('2024-07-01 00:00:00')),
    PARTITION p2407 VALUES LESS THAN (UNIX_TIMESTAMP('2024-08-01 00:00:00')),
    PARTITION p2408 VALUES LESS THAN (UNIX_TIMESTAMP('2024-09-01 00:00:00')),
    PARTITION p2409 VALUES LESS THAN (UNIX_TIMESTAMP('2024-10-01 00:00:00')),
    PARTITION p2410 VALUES LESS THAN (UNIX_TIMESTAMP('2024-11-01 00:00:00')),
    PARTITION p2411 VALUES LESS THAN (UNIX_TIMESTAMP('2024-12-01 00:00:00')),
    PARTITION p2412 VALUES LESS THAN (UNIX_TIMESTAMP('2025-01-01 00:00:00')),
    PARTITION pN VALUES LESS THAN MAXVALUE
);

CREATE TABLE IF NOT EXISTS `audit_coordination`
(
    `id`         BIGINT auto_increment NOT NULL,
    `message_id` VARCHAR(100) DEFAULT NULL,
    `event_type` VARCHAR(30) NOT NULL,
    `created_at` TIMESTAMP NOT NULL,
    `body`       TEXT NOT NULL,
    `audit_ts`   TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    PRIMARY KEY (id, audit_ts)
    ) PARTITION BY RANGE ( UNIX_TIMESTAMP(audit_ts)) (
    PARTITION p2401 VALUES LESS THAN (UNIX_TIMESTAMP('2024-02-01 00:00:00')),
    PARTITION p2402 VALUES LESS THAN (UNIX_TIMESTAMP('2024-03-01 00:00:00')),
    PARTITION p2403 VALUES LESS THAN (UNIX_TIMESTAMP('2024-04-01 00:00:00')),
    PARTITION p2404 VALUES LESS THAN (UNIX_TIMESTAMP('2024-05-01 00:00:00')),
    PARTITION p2405 VALUES LESS THAN (UNIX_TIMESTAMP('2024-06-01 00:00:00')),
    PARTITION p2406 VALUES LESS THAN (UNIX_TIMESTAMP('2024-07-01 00:00:00')),
    PARTITION p2407 VALUES LESS THAN (UNIX_TIMESTAMP('2024-08-01 00:00:00')),
    PARTITION p2408 VALUES LESS THAN (UNIX_TIMESTAMP('2024-09-01 00:00:00')),
    PARTITION p2409 VALUES LESS THAN (UNIX_TIMESTAMP('2024-10-01 00:00:00')),
    PARTITION p2410 VALUES LESS THAN (UNIX_TIMESTAMP('2024-11-01 00:00:00')),
    PARTITION p2411 VALUES LESS THAN (UNIX_TIMESTAMP('2024-12-01 00:00:00')),
    PARTITION p2412 VALUES LESS THAN (UNIX_TIMESTAMP('2025-01-01 00:00:00')),
    PARTITION pN VALUES LESS THAN MAXVALUE
);

What’s actually going on

The parser does not support the MySQL partition syntax used above. This was clarified in community discussion for sqlglot 27.8.0, and the maintainers are not planning to implement it.

If the input is valid MySQL, this is likely a gap in the parser
it looks like we don’t support the VALUES LESS THAN syntax for MySQL
however, Doris already supports this
since it inherits from MySQL, we could probably just bubble that logic up to it instead of keeping it in Doris, so both can leverage it

Trying to switch the dialect also doesn’t help in this case. With dialect set to doris, parsing still fails:

An error occurred during parsing: Expecting ). Line 18, Col: 42.
   `audit_control_created_at_IDX` (`created_at`) USING BTREE
    ) PARTITION BY RANGE ( UNIX_TIMESTAMP(audit_ts)) (
    PARTITION p2401 VALUES LESS THAN (UNIX_TIMESTAMP('2024-02-01 00:00:00')),
    PARTI

We’re not planning to work on this, so no. You’re free to work on it though, we’ll be happy to accept a well-tested & documented PR.

Resolution and practical next steps

Within the constraints above, this behavior is expected in sqlglot 27.8.0. If your DDL relies on PARTITION BY RANGE with VALUES LESS THAN, the library will raise a parse error. Switching to the doris dialect does not resolve the failure in the example provided. The maintainers are open to a contribution that adds this functionality.

If you still want to try a different dialect anyway, the only change in code is the dialect argument:

names = collect_tables(input_sql, sql_flavor="doris")

Be aware that, for this input, the parse error persists with doris.

Why you should care

When you depend on AST-driven tooling for migrations, lineage, or static analysis, unsupported grammar features can silently block pipelines. Partitioned tables are common in time-series workloads and auditing schemas; knowing upfront that VALUES LESS THAN is outside sqlglot’s current coverage helps avoid chasing downstream issues.

Takeaways

Validate parser support against your DDL surface before rolling it into automation. If you hit a parsing gap, make sure your error snippet and the SQL sample match exactly, including the reported line and column, to ease troubleshooting. For PARTITION BY RANGE with VALUES LESS THAN on sqlglot 27.8.0, expect a parse failure and plan accordingly, or consider contributing a well-tested PR to add the missing grammar.

The article is based on a question from StackOverflow by tuk and an answer by tuk.

python sqlglot