2025, Oct 18 09:00

How to Export Milvus Collections at Scale: Bypass the 16384 Query Limit with Iterators and Streamed Batches

Move Milvus data to dynamic fields without the 16384 query limit: stream with iterators, batch inserts, and consider VTS or the add field support in Milvus 2.6.

Exporting data out of Milvus becomes necessary when a collection needs to be reshaped, for example, to enable dynamic fields. The obvious path is to export everything and insert into a freshly created collection. The snag many run into is a hard limit during query-based export: a 16384 ceiling that stops the process midway. If your primary key is a string, range-based slicing by intervals is not an option. Below is a practical way to approach the migration without hitting that wall.

Problem overview

Milvus does not support adding fields by default, so the data has to be moved to a new collection configured with dynamic fields. A straightforward export via query stumbles on the 16384 limit. With a string primary key, interval-based pagination is off the table.

Naive approach that triggers the limit

The most direct way is to run a single query and try to dump all records in one go. This is what typically causes the 16384 limit to surface.

def dump_all_via_query(store, coll_name, filter_expr):
    records = store.query(collection=coll_name, where=filter_expr)
    return records

This pattern attempts to bring back the entire dataset in one shot. When the dataset is large, the process runs into the 16384 ceiling and fails, leaving the export incomplete.

Why the limit bites here

The query path is capped, so large exports through a single call are blocked once the number of returned entries reaches 16384. Since the primary key is a string, you can’t fall back to interval slicing by numeric ranges to paginate around the limit. The result is a dead end for full exports via plain query.

Practical way out

There are three viable directions. First, on 2.6 add field is exported. Second, you can use iterator to export your data out, which avoids the single-shot query cap by streaming data in batches. Third, you can also use VTS tool help you to do this. The iterator approach is the most direct if you want to keep everything in code and push data into the new collection as you go.

The pattern is simple: create an iterator for the source collection, pull batches until exhaustion, and write them to the target. This sidesteps the 16384 limit because you don’t rely on a single query response.

def export_with_iterator(src_handle, src_coll, cond):
    it = src_handle.iterator(collection=src_coll, where=cond)
    while True:
        batch = it.next()
        if not batch:
            break
        for row in batch:
            yield row


def migrate_to_target(dst_handle, dst_coll, rows_iterable):
    buffer = []
    for row in rows_iterable:
        buffer.append(row)
        if len(buffer) >= 1000:
            dst_handle.insert(collection=dst_coll, data=buffer)
            buffer.clear()
    if buffer:
        dst_handle.insert(collection=dst_coll, data=buffer)

This structure keeps memory usage predictable and moves data in chunks. The exact batch size is adjustable; the point is to stream, not to accumulate everything in one giant response.

Where the version and tooling fit in

If you are on 2.6, note that on 2.6 add field is exported. If you prefer not to write export logic, you can also use VTS tool help you to do this. The iterator route is code-first and works well when you already have a pipeline to insert into the new collection configured with dynamic fields.

Why this matters

Data migrations are frequent when schemas evolve. Hitting a hard query limit in the middle of a migration can stall a release. Knowing that an iterator-based export avoids the 16384 ceiling and that VTS is available provides a clear path forward, even when the primary key is a string and interval pagination is out of scope.

Conclusion

When moving data to a collection with dynamic fields, avoid single-shot query exports that run into the 16384 limit. Stream the dataset instead: use iterator to export your data out and reinsert in batches into the target collection. If you are on 2.6, remember that on 2.6 add field is exported, and if you don’t want to write code, VTS can help with the process. Plan the export as a streamed operation, and the migration becomes predictable and resilient.

The article is based on a question from StackOverflow by Jade Roy and an answer by James Luan.

database milvus python