https://pytroubles.com/en/posts/id1314-avoid-non-transactional-firestore-reads-in-python-transactions-why-stream-inside-hurts

Avoid Non-Transactional Firestore Reads in Python Transactions: Why stream() Inside Hurts

Firestore Transactions in Python: Avoid stream() and Other Non-Transactional Reads for Atomic, Fast Updates

Avoid Non-Transactional Firestore Reads in Python Transactions: Why stream() Inside Hurts

Learn why stream() and other non-transactional reads inside Firestore Python transactions break atomicity, cause slow retries, and how to fix them safely.

2025-10-25T11:00:07+03:00

2025-10-25T11:00:08+03:00

Running Firestore queries like stream() inside a Python transaction can feel harmless when everything “just works”. But there’s a subtle trap: if those reads don’t go through the transaction object itself, you’re opting out of core transactional guarantees and can unintentionally make the transaction slower and more wasteful. The guidance to avoid non-transactional reads/writes inside a transaction still matters even if your code doesn’t crash.Problem exampleThe snippet below deletes all existing markdown chunks and writes new ones inside a transaction. The deletion list comes from a plain collection stream() that doesn’t use the provided transaction object.def write_md_chunks(txn: firestore.Transaction, note_doc: firestore.DocumentReference, md_text: str): md_coll = note_doc.collection('markdowns') # Remove existing chunks (non-transactional read inside a transaction) existing = md_coll.stream() for snap in existing: target = md_coll.document(snap.id) txn.delete(target) print("EXPECTED A CRASH, BUT IT DIDN'T. WHY?") pieces = helpers.split_text(md_text, CHUNK_LIMIT) for idx, part in enumerate(pieces): part_doc = md_coll.document() txn.set(part_doc, { 'text': part, 'order': idx + 1 }) @http_fx.on_request( cors=cors_opts.CorsOptions( cors_origins=["*"], cors_methods=["POST"], ) ) def handle_md_edit(req: http_fx.Request) -> http_fx.Response: if req.method != 'POST': return http_fx.Response( json.dumps({"error": "Only POST requests are allowed"}), status=405 ) payload = req.get_json(silent=True) if not payload: return http_fx.Response( json.dumps({"error": "Invalid request data"}), status=400 ) user_id = payload['data']['uid'] note_id = payload['data']['doc_id'] next_md = payload['data']['markdown'] if helpers.is_blank_or_none(next_md): return http_fx.Response( json.dumps({"error": "Invalid request data"}), status=400 ) if len(next_md) > 524288: return http_fx.Response( json.dumps({"error": "Invalid request data"}), status=400 ) db = firestore.client() now_ms = int(time.time() * 1000) try: @firestore.transactional def apply_note_update(txn): note_doc = ( db.collection('users') .document(user_id) .collection('notes') .document(note_id) ) current_md = md_tools.get_markdown( transaction=txn, note_ref=note_doc ) if next_md != current_md: original_md = md_tools.get_original_markdown( transaction=txn, note_ref=note_doc ) if next_md == original_md: md_tools.delete_original_markdown( transaction=txn, note_ref=note_doc ) else: md_tools.insert_original_markdown_if_not_exist( transaction=txn, note_ref=note_doc, original_markdown=current_md ) md_tools.update_markdown( transaction=txn, note_ref=note_doc, markdown=next_md ) txn.update( note_doc, { 'modified_timestamp': now_ms, 'synced_timestamp': now_ms } ) txn = db.transaction() apply_note_update(txn) response = { "data": { "modified_timestamp": now_ms, "synced_timestamp": now_ms } } return http_fx.Response( json.dumps(response), status=200 ) except Exception as err: print(f"Error updating note markdown: {str(err)}") return http_fx.Response( json.dumps({"data": {"error": f"An error occurred: {str(err)}"}}), status=500 ) What’s actually wrong hereThe core issue isn’t that stream() is inherently illegal. The problem is any database read or write that occurs inside the transaction scope but doesn’t use the provided transaction object. There are two reasons to avoid that pattern. First, transactions should run fast to minimize contention. Running queries that the transaction doesn’t coordinate can slow it down. Second, a transaction may retry multiple times under contention. If you read via non-transactional calls inside the transaction body, those reads can happen repeatedly and unnecessarily. You also lose the benefit of automatic retry tied to those reads, which undermines atomicity.This also explains why you didn’t see a crash. There isn’t a reliable, low-overhead way for the SDK to detect at runtime that you mixed non-transactional reads with a currently active transaction, especially with multiple threads in play. It’s on you to follow best practices and keep transactional work transactional.How to fix itThere are two safe approaches. Either read the data using the transaction object so that changes to those documents cause a proper retry and preserve atomicity, or move those reads completely outside the transaction so they don’t slow down and complicate the transactional run.The revised example below moves the stream() call out of the transaction. The transaction then deletes by ID and writes the new chunks. The overall behavior remains the same, but the non-transactional read is no longer performed inside the transaction body.def put_md_chunks_precomputed(txn: firestore.Transaction, note_doc: firestore.DocumentReference, md_text: str, chunk_ids: list[str]): md_coll = note_doc.collection('markdowns') # Delete known chunk docs inside the transaction for cid in chunk_ids: txn.delete(md_coll.document(cid)) parts = helpers.split_text(md_text, CHUNK_LIMIT) for idx, part in enumerate(parts): new_doc = md_coll.document() txn.set(new_doc, { 'text': part, 'order': idx + 1 }) @http_fx.on_request( cors=cors_opts.CorsOptions( cors_origins=["*"], cors_methods=["POST"], ) ) def handle_md_edit(req: http_fx.Request) -> http_fx.Response: if req.method != 'POST': return http_fx.Response( json.dumps({"error": "Only POST requests are allowed"}), status=405 ) payload = req.get_json(silent=True) if not payload: return http_fx.Response( json.dumps({"error": "Invalid request data"}), status=400 ) user_id = payload['data']['uid'] note_id = payload['data']['doc_id'] next_md = payload['data']['markdown'] if helpers.is_blank_or_none(next_md): return http_fx.Response( json.dumps({"error": "Invalid request data"}), status=400 ) if len(next_md) > 524288: return http_fx.Response( json.dumps({"error": "Invalid request data"}), status=400 ) db = firestore.client() now_ms = int(time.time() * 1000) # Pre-read outside the transaction note_doc = ( db.collection('users') .document(user_id) .collection('notes') .document(note_id) ) md_coll = note_doc.collection('markdowns') to_delete_ids = [snap.id for snap in md_coll.stream()] try: @firestore.transactional def apply_note_update(txn): # Always re-derive the same note_doc inside for clarity nd = ( db.collection('users') .document(user_id) .collection('notes') .document(note_id) ) current_md = md_tools.get_markdown(transaction=txn, note_ref=nd) if next_md != current_md: original_md = md_tools.get_original_markdown(transaction=txn, note_ref=nd) if next_md == original_md: md_tools.delete_original_markdown(transaction=txn, note_ref=nd) else: md_tools.insert_original_markdown_if_not_exist( transaction=txn, note_ref=nd, original_markdown=current_md ) # Replace chunks using the pre-read IDs put_md_chunks_precomputed(txn, nd, next_md, to_delete_ids) txn.update(nd, { 'modified_timestamp': now_ms, 'synced_timestamp': now_ms }) txn = db.transaction() apply_note_update(txn) return http_fx.Response( json.dumps({ "data": { "modified_timestamp": now_ms, "synced_timestamp": now_ms } }), status=200 ) except Exception as err: print(f"Error updating note markdown: {str(err)}") return http_fx.Response( json.dumps({"data": {"error": f"An error occurred: {str(err)}"}}), status=500 ) Why this is worth rememberingTransactions may retry, and they should be kept lean. Reading or writing Firestore without using the transaction object while you’re inside the transaction body defeats those goals. You won’t necessarily get a runtime error, and there isn’t a practical way for the SDK to enforce this automatically without adding overhead. It’s a discipline issue. Keep your transactional code path clean and predictable so retries and atomicity work in your favor.TakeawaysIf you must read data that affects the transaction’s outcome, do it with the transaction object. If you just need a list of IDs or other context that doesn’t require transactional coordination, fetch it before the transaction starts. Don’t rely on the absence of crashes as a signal that the pattern is safe. The guidance stands: avoid non-transactional Firestore reads and writes inside a transaction.

Firestore transactions, Python Firestore, stream() in transaction, non-transactional reads, atomicity, retries, performance, best practices, transactional reads, Firestore Python SDK

2025

2025, Oct 25 11:00

Firestore Transactions in Python: Avoid stream() and Other Non-Transactional Reads for Atomic, Fast Updates

Learn why stream() and other non-transactional reads inside Firestore Python transactions break atomicity, cause slow retries, and how to fix them safely.

Problem example

The snippet below deletes all existing markdown chunks and writes new ones inside a transaction. The deletion list comes from a plain collection stream() that doesn’t use the provided transaction object.

def write_md_chunks(txn: firestore.Transaction, note_doc: firestore.DocumentReference, md_text: str):
    md_coll = note_doc.collection('markdowns')
    # Remove existing chunks (non-transactional read inside a transaction)
    existing = md_coll.stream()
    for snap in existing:
        target = md_coll.document(snap.id)
        txn.delete(target)
        print("EXPECTED A CRASH, BUT IT DIDN'T. WHY?")
    pieces = helpers.split_text(md_text, CHUNK_LIMIT)
    for idx, part in enumerate(pieces):
        part_doc = md_coll.document()
        txn.set(part_doc, {
            'text': part,
            'order': idx + 1
        })
@http_fx.on_request(
    cors=cors_opts.CorsOptions(
        cors_origins=["*"],
        cors_methods=["POST"],
    )
)
def handle_md_edit(req: http_fx.Request) -> http_fx.Response:
    if req.method != 'POST':
        return http_fx.Response(
            json.dumps({"error": "Only POST requests are allowed"}),
            status=405
        )
    payload = req.get_json(silent=True)
    if not payload:
        return http_fx.Response(
            json.dumps({"error": "Invalid request data"}),
            status=400
        )
    user_id = payload['data']['uid']
    note_id = payload['data']['doc_id']
    next_md = payload['data']['markdown']
    if helpers.is_blank_or_none(next_md):
        return http_fx.Response(
            json.dumps({"error": "Invalid request data"}),
            status=400
        )
    if len(next_md) > 524288:
        return http_fx.Response(
            json.dumps({"error": "Invalid request data"}),
            status=400
        )
    db = firestore.client()
    now_ms = int(time.time() * 1000)
    try:
        @firestore.transactional
        def apply_note_update(txn):
            note_doc = (
                db.collection('users')
                .document(user_id)
                .collection('notes')
                .document(note_id)
            )
            current_md = md_tools.get_markdown(
                transaction=txn,
                note_ref=note_doc
            )
            if next_md != current_md:
                original_md = md_tools.get_original_markdown(
                    transaction=txn,
                    note_ref=note_doc
                )
                if next_md == original_md:
                    md_tools.delete_original_markdown(
                        transaction=txn,
                        note_ref=note_doc
                    )
                else:
                    md_tools.insert_original_markdown_if_not_exist(
                        transaction=txn,
                        note_ref=note_doc,
                        original_markdown=current_md
                    )
                md_tools.update_markdown(
                    transaction=txn,
                    note_ref=note_doc,
                    markdown=next_md
                )
                txn.update(
                    note_doc,
                    {
                        'modified_timestamp': now_ms,
                        'synced_timestamp': now_ms
                    }
                )
        txn = db.transaction()
        apply_note_update(txn)
        response = {
            "data": {
                "modified_timestamp": now_ms,
                "synced_timestamp": now_ms
            }
        }
        return http_fx.Response(
            json.dumps(response),
            status=200
        )
    except Exception as err:
        print(f"Error updating note markdown: {str(err)}")
        return http_fx.Response(
            json.dumps({"data": {"error": f"An error occurred: {str(err)}"}}),
            status=500
        )

What’s actually wrong here

The core issue isn’t that stream() is inherently illegal. The problem is any database read or write that occurs inside the transaction scope but doesn’t use the provided transaction object. There are two reasons to avoid that pattern. First, transactions should run fast to minimize contention. Running queries that the transaction doesn’t coordinate can slow it down. Second, a transaction may retry multiple times under contention. If you read via non-transactional calls inside the transaction body, those reads can happen repeatedly and unnecessarily. You also lose the benefit of automatic retry tied to those reads, which undermines atomicity.

This also explains why you didn’t see a crash. There isn’t a reliable, low-overhead way for the SDK to detect at runtime that you mixed non-transactional reads with a currently active transaction, especially with multiple threads in play. It’s on you to follow best practices and keep transactional work transactional.

How to fix it

There are two safe approaches. Either read the data using the transaction object so that changes to those documents cause a proper retry and preserve atomicity, or move those reads completely outside the transaction so they don’t slow down and complicate the transactional run.

The revised example below moves the stream() call out of the transaction. The transaction then deletes by ID and writes the new chunks. The overall behavior remains the same, but the non-transactional read is no longer performed inside the transaction body.

def put_md_chunks_precomputed(txn: firestore.Transaction,
                              note_doc: firestore.DocumentReference,
                              md_text: str,
                              chunk_ids: list[str]):
    md_coll = note_doc.collection('markdowns')
    # Delete known chunk docs inside the transaction
    for cid in chunk_ids:
        txn.delete(md_coll.document(cid))
    parts = helpers.split_text(md_text, CHUNK_LIMIT)
    for idx, part in enumerate(parts):
        new_doc = md_coll.document()
        txn.set(new_doc, {
            'text': part,
            'order': idx + 1
        })
@http_fx.on_request(
    cors=cors_opts.CorsOptions(
        cors_origins=["*"],
        cors_methods=["POST"],
    )
)
def handle_md_edit(req: http_fx.Request) -> http_fx.Response:
    if req.method != 'POST':
        return http_fx.Response(
            json.dumps({"error": "Only POST requests are allowed"}),
            status=405
        )
    payload = req.get_json(silent=True)
    if not payload:
        return http_fx.Response(
            json.dumps({"error": "Invalid request data"}),
            status=400
        )
    user_id = payload['data']['uid']
    note_id = payload['data']['doc_id']
    next_md = payload['data']['markdown']
    if helpers.is_blank_or_none(next_md):
        return http_fx.Response(
            json.dumps({"error": "Invalid request data"}),
            status=400
        )
    if len(next_md) > 524288:
        return http_fx.Response(
            json.dumps({"error": "Invalid request data"}),
            status=400
        )
    db = firestore.client()
    now_ms = int(time.time() * 1000)
    # Pre-read outside the transaction
    note_doc = (
        db.collection('users')
        .document(user_id)
        .collection('notes')
        .document(note_id)
    )
    md_coll = note_doc.collection('markdowns')
    to_delete_ids = [snap.id for snap in md_coll.stream()]
    try:
        @firestore.transactional
        def apply_note_update(txn):
            # Always re-derive the same note_doc inside for clarity
            nd = (
                db.collection('users')
                .document(user_id)
                .collection('notes')
                .document(note_id)
            )
            current_md = md_tools.get_markdown(transaction=txn, note_ref=nd)
            if next_md != current_md:
                original_md = md_tools.get_original_markdown(transaction=txn, note_ref=nd)
                if next_md == original_md:
                    md_tools.delete_original_markdown(transaction=txn, note_ref=nd)
                else:
                    md_tools.insert_original_markdown_if_not_exist(
                        transaction=txn,
                        note_ref=nd,
                        original_markdown=current_md
                    )
                # Replace chunks using the pre-read IDs
                put_md_chunks_precomputed(txn, nd, next_md, to_delete_ids)
                txn.update(nd, {
                    'modified_timestamp': now_ms,
                    'synced_timestamp': now_ms
                })
        txn = db.transaction()
        apply_note_update(txn)
        return http_fx.Response(
            json.dumps({
                "data": {
                    "modified_timestamp": now_ms,
                    "synced_timestamp": now_ms
                }
            }),
            status=200
        )
    except Exception as err:
        print(f"Error updating note markdown: {str(err)}")
        return http_fx.Response(
            json.dumps({"data": {"error": f"An error occurred: {str(err)}"}}),
            status=500
        )

Why this is worth remembering

Transactions may retry, and they should be kept lean. Reading or writing Firestore without using the transaction object while you’re inside the transaction body defeats those goals. You won’t necessarily get a runtime error, and there isn’t a practical way for the SDK to enforce this automatically without adding overhead. It’s a discipline issue. Keep your transactional code path clean and predictable so retries and atomicity work in your favor.

Takeaways

If you must read data that affects the transaction’s outcome, do it with the transaction object. If you just need a list of IDs or other context that doesn’t require transactional coordination, fetch it before the transaction starts. Don’t rely on the absence of crashes as a signal that the pattern is safe. The guidance stands: avoid non-transactional Firestore reads and writes inside a transaction.

The article is based on a question from StackOverflow by Cheok Yan Cheng and an answer by Doug Stevenson.

firebase google-cloud-firestore python