2025, Oct 07 07:00

Stream zipping large MinIO objects with miniopy-async: use temp files to avoid RAM limits and OOM

Learn how to zip multi-GB MinIO objects with miniopy-async by streaming chunks to a temp file, avoiding in-memory buffering, and uploading the archive reliably.

Compressing a set of large MinIO objects into a single archive sounds simple until the numbers show up. Individual objects can reach 40 GB, available RAM is 4 GB, and disk space is about 240 GB. With miniopy-async in the stack, the only realistic path is to avoid buffering entire payloads in memory and stream both the download and the zip creation.

The trap: building a zip in memory

Reading entire objects before writing them into a zip is the first idea many reach for, and it is exactly what breaks under tight memory limits. The following sketch demonstrates why this approach is unsafe for multi‑GB objects.

import io
import zipfile
import asyncio
from miniopy_async import Minio

api = Minio("localhost:9000", access_key="xxx", secret_key="xxx", secure=False)

async def build_zip_in_ram(source_bucket, item_keys):
    buf = io.BytesIO()
    with zipfile.ZipFile(buf, "w", compression=zipfile.ZIP_STORED) as bundle:
        for name in item_keys:
            res = await api.get_object(source_bucket, name)
            blob = await res.read()  # pulls the whole object into RAM
            bundle.writestr(name, blob)
            await res.close()
    return buf.getvalue()  # the entire archive now also lives in RAM

This version downloads each object with read(), stores it entirely in memory, and builds the archive in a BytesIO buffer. With 4 GB of RAM and objects up to 40 GB, it is guaranteed to hit the ceiling.

The essence of the problem

The core issue is buffering. Pulling large objects via get_object and read() allocates memory proportional to object size, and holding a growing zip in BytesIO multiplies the pressure. The fix is to switch from memory buffering to streaming throughout the pipeline.

miniopy-async exposes an async stream from get_object, which allows processing chunks as they arrive. Python’s zipfile works with file-like targets but expects them to be seekable. A straightforward and reliable workaround is to write the archive to a temporary file on disk, which matches the constraints when you have significantly more disk than RAM.

A practical streaming solution with a temp file

The following implementation streams each object from MinIO directly into a zip entry, emits the zip into a temporary file on disk, and finally uploads that file back to MinIO as a single object. The process never holds full objects or the full archive in memory.

import asyncio
import aiofiles
import tempfile
import zipfile
from miniopy_async import Minio

mc = Minio("localhost:9000", access_key="xxx", secret_key="xxx", secure=False)

async def pack_and_push(src_bucket, object_ids, dst_bucket, dst_name):
    # allocate a temporary file for the resulting zip archive
    with tempfile.NamedTemporaryFile(delete=False) as tf:
        spool_path = tf.name

    # stream each object into the archive without loading it fully into RAM
    with zipfile.ZipFile(spool_path, "w", compression=zipfile.ZIP_STORED) as archive:
        for obj_key in object_ids:
            stream = await mc.get_object(src_bucket, obj_key)
            with archive.open(obj_key, "w") as sink:
                async for fragment in stream.stream():
                    sink.write(fragment)
            await stream.close()

    # upload the finished zip file back to MinIO
    await mc.fput_object(dst_bucket, dst_name, spool_path)

This pattern keeps memory usage bounded to the size of a single chunk being transferred and written at any given moment, while zipfile writes a seekable archive to disk. With a 240 GB drive, you can accommodate large archives as long as the resulting zip fits on disk.

When disk isn’t enough

If the combined size of N objects exceeds what you can allocate on disk, the next step is to stream the zip all the way to MinIO using multipart upload. Because zipfile expects a seekable target, this requires a streaming zip implementation instead of the standard library. One option is a library like zipstream that supports producing a zip as a stream.

Notes from the field

I'll mark it as solution. The second method is most rational. But i have another idea - block io.BytesIO after .read() to wait for new data, it help's sequentially upload the object. But it's in theory.

The idea above highlights the same direction: avoid accumulating full payloads in memory, prefer sequential flow. In practice, a temp file backed by disk remains the most straightforward and predictable approach under the stated constraints.

Why this matters

Large-object processing pipelines often fail not because of bandwidth, but because of unbounded buffering. Streaming minimizes peak memory, keeps process stability under tight RAM limits, and prevents accidental OOM events during otherwise simple operations like archiving. Using a seekable temp file sidesteps zip format requirements without complicating the upload flow.

Wrap-up

If you need to zip many large MinIO objects with miniopy-async on a memory-limited host, stream everything you can. Iterate over chunks from get_object instead of read(), write zip entries directly to a temporary file on disk, then upload the finished file with fput_object. If the resulting archive cannot fit on disk, switch to a streaming zip implementation and multipart upload. Stay away from in-memory buffers for multi‑GB objects, and your pipeline will remain both fast and predictable.

The article is based on a question from StackOverflow by TASK and an answer by Aditya.

large-file-upload large-files minio python