https://pytroubles.com/en/posts/id2535-flask-csv-export-at-scale-stream-rows-with-generators-and-a-tiny-buffer-to-prevent-oom

Flask CSV Export at Scale: Stream Rows with Generators and a Tiny Buffer to Prevent OOM

Stream CSV from a Flask Endpoint in Constant Memory: Generators, StringIO reuse, and proper download headers

Flask CSV Export at Scale: Stream Rows with Generators and a Tiny Buffer to Prevent OOM

Export huge CSVs from Flask without exhausting RAM: stream rows with a generator, reuse a StringIO buffer, and set Content-Disposition for file downloads.

2025-12-18T03:00:11+03:00

2025-12-18T03:00:12+03:00

Exporting a CSV from a web endpoint looks trivial until the dataset grows into the millions. A straightforward approach that accumulates the entire file in memory will eventually crash the process. The practical answer is to stream the CSV row by row in a constant-memory fashion.ProblemThe endpoint below collects the full CSV payload in memory before sending it to the client. This works for small datasets, but scaling it to millions of rows causes the process to run out of RAM.import io, csv from flask import Flask, Response svc = Flask(__name__) def fetch_rows(): # imagine 5 million DB rows here for n in range(5_000_000): yield (n, f"name-{n}", n % 100) @svc.get("/export") def export_csv_plain(): row_iter = fetch_rows() mem_io = io.StringIO() csvw = csv.writer(mem_io) csvw.writerow(["id", "name", "score"]) csvw.writerows(row_iter) mem_io.seek(0) return Response( mem_io.getvalue(), mimetype="text/csv", headers={"Content-Disposition": "attachment; filename=report.csv"} ) What goes wrong and whyThe code writes every row into a single in-memory buffer before returning anything to the client. With large reports, the buffer grows until the process is out of memory. The dataset itself might be streamable, but the CSV assembly step defeats that benefit by materializing the entire file.Solution: stream with a tiny reusable bufferInstead of accumulating the whole file, write into a small in-memory buffer, yield its contents to the client, then reset the buffer and continue. This pattern keeps memory flat regardless of how many rows are sent. The response is turned into a stream using a generator wrapped with stream_with_context. Headers include Content-Disposition so the browser offers a file-save dialog.from flask import Flask, Response, stream_with_context import csv, io svc = Flask(__name__) def fetch_rows(): for n in range(5_000_000): yield n, f"name-{n}", n % 100 @svc.get("/export") def export_csv_stream(): def chunked(): stash = io.StringIO() cw = csv.writer(stash) cw.writerow(("id", "name", "score")) yield stash.getvalue() stash.seek(0) stash.truncate(0) for rec in fetch_rows(): cw.writerow(rec) yield stash.getvalue() stash.seek(0) stash.truncate(0) hdr_map = { "Content-Disposition": "attachment; filename=report.csv", "X-Accel-Buffering": "no", } return Response( stream_with_context(chunked()), mimetype="text/csv", headers=hdr_map, direct_passthrough=True, ) Why this worksThe generator writes a header, yields it, then processes each row and yields the buffer’s contents immediately. After every yield the buffer is rewound with seek(0) and cleared with truncate(0). Reusing a tiny StringIO in this loop prevents RAM from growing no matter how many rows are streamed. The Content-Disposition header ensures the browser treats the response as a downloadable file.Why it mattersReports tend to grow over time. A design that seems fine during early development can start crashing in production once the dataset reaches millions of records. Streaming the CSV keeps memory usage predictable and allows the client to start receiving data right away.ConclusionWhen exporting CSV from a Flask endpoint, avoid building the entire file in memory. Stream it row by row using a generator, reuse a small buffer with seek and truncate after each yield, and set appropriate headers for download behavior. This approach scales cleanly and keeps the service stable even for very large exports.

Flask CSV export, stream CSV Flask, streaming endpoint, generator, StringIO buffer, Content-Disposition, constant memory, avoid OOM, download large CSV, millions of rows, Python csv, Flask Response

2025

2025, Dec 18 03:00

Stream CSV from a Flask Endpoint in Constant Memory: Generators, StringIO reuse, and proper download headers

Export huge CSVs from Flask without exhausting RAM: stream rows with a generator, reuse a StringIO buffer, and set Content-Disposition for file downloads.

Problem

The endpoint below collects the full CSV payload in memory before sending it to the client. This works for small datasets, but scaling it to millions of rows causes the process to run out of RAM.

import io, csv
from flask import Flask, Response
svc = Flask(__name__)
def fetch_rows():
    # imagine 5 million DB rows here
    for n in range(5_000_000):
        yield (n, f"name-{n}", n % 100)
@svc.get("/export")
def export_csv_plain():
    row_iter = fetch_rows()
    mem_io = io.StringIO()
    csvw = csv.writer(mem_io)
    csvw.writerow(["id", "name", "score"])
    csvw.writerows(row_iter)
    mem_io.seek(0)
    return Response(
        mem_io.getvalue(),
        mimetype="text/csv",
        headers={"Content-Disposition": "attachment; filename=report.csv"}
    )

What goes wrong and why

The code writes every row into a single in-memory buffer before returning anything to the client. With large reports, the buffer grows until the process is out of memory. The dataset itself might be streamable, but the CSV assembly step defeats that benefit by materializing the entire file.

Solution: stream with a tiny reusable buffer

Instead of accumulating the whole file, write into a small in-memory buffer, yield its contents to the client, then reset the buffer and continue. This pattern keeps memory flat regardless of how many rows are sent. The response is turned into a stream using a generator wrapped with stream_with_context. Headers include Content-Disposition so the browser offers a file-save dialog.

from flask import Flask, Response, stream_with_context
import csv, io
svc = Flask(__name__)
def fetch_rows():
    for n in range(5_000_000):
        yield n, f"name-{n}", n % 100
@svc.get("/export")
def export_csv_stream():
    def chunked():
        stash = io.StringIO()
        cw = csv.writer(stash)
        cw.writerow(("id", "name", "score"))
        yield stash.getvalue()
        stash.seek(0)
        stash.truncate(0)
        for rec in fetch_rows():
            cw.writerow(rec)
            yield stash.getvalue()
            stash.seek(0)
            stash.truncate(0)
    hdr_map = {
        "Content-Disposition": "attachment; filename=report.csv",
        "X-Accel-Buffering": "no",
    }
    return Response(
        stream_with_context(chunked()),
        mimetype="text/csv",
        headers=hdr_map,
        direct_passthrough=True,
    )

Why this works

The generator writes a header, yields it, then processes each row and yields the buffer’s contents immediately. After every yield the buffer is rewound with seek(0) and cleared with truncate(0). Reusing a tiny StringIO in this loop prevents RAM from growing no matter how many rows are streamed. The Content-Disposition header ensures the browser treats the response as a downloadable file.

Why it matters

Reports tend to grow over time. A design that seems fine during early development can start crashing in production once the dataset reaches millions of records. Streaming the CSV keeps memory usage predictable and allows the client to start receiving data right away.

Conclusion

When exporting CSV from a Flask endpoint, avoid building the entire file in memory. Stream it row by row using a generator, reuse a small buffer with seek and truncate after each yield, and set appropriate headers for download behavior. This approach scales cleanly and keeps the service stable even for very large exports.

flask python