2025, Oct 24 05:00

Why CLOBs Crawl over python-oracledb in the Cloud and How to Make Updates Fast

Why CLOB updates crawl in python-oracledb over cloud latency: LOB round trips, object type overhead, and fixes using varchar2(4000) and batch executemany.

Slow CLOB updates over python-oracledb can be baffling when everything else flies. The pattern is familiar: object type updates that are quick on a local database suddenly crawl in a cloud-hosted Oracle instance. The delta is not the payload size; it’s the latency profile and how LOBs are handled under the hood.

Reproducing the setup

The example below creates an object type with a CLOB attribute and then populates 40,000 instances from Python. The logic mirrors a typical object-creation/update loop; only names differ.

-- Object type with a CLOB attribute
create TYPE DOC_REC_T as OBJECT(
   DOC_ID NUMBER,
   DOC_BODY CLOB
);

import os
import time

import numpy as np
import oracledb
import pandas as pd


dsn = "localhost:1521/ORCLCDB"
db_user = "SYS"
db_pass = "mypassword1"

try:
    client_dir = os.path.join(os.environ.get("HOME"), "db", "instantclient_23_3")
    oracledb.init_oracle_client(lib_dir=client_dir)

    total_items = 40000
    payload = {
        'txt_col': [f"This is a very long string example for row {k}. It can contain various characters and be quite lengthy to simulate real-world data scenarios." * 5 for k in range(total_items)],
        'num_col': np.random.randint(1000, 100000, size=total_items)
    }

    frame = pd.DataFrame(payload)

    conn = oracledb.connect(user=db_user, password=db_pass, dsn=dsn, mode=oracledb.SYSDBA)
    print("Connected to Oracle Database (CDB Root) using Thin/Thick Client successfully!")

    cur = conn.cursor()
    objects_out = []
    obj_type = conn.gettype("DOC_REC_T")

    pool = [obj_type.newobject() for _ in range(total_items)]

    t0 = time.time()

    for idx, row in enumerate(frame.itertuples(index=False), start=0):
        rec = pool[idx]
        setattr(rec, 'DOC_ID', idx)
        setattr(rec, 'DOC_BODY', f"Name_{idx}" * 4000)
        objects_out.append(rec)

    t1 = time.time()

    print(f"Time to create objects and assign attributes: {t1 - t0:.4f} seconds")
    print(f"Total objects: {len(objects_out)}")

except oracledb.Error as exc:
    err, = exc.args
    print(f"Oracle error: {err.message}")
finally:
    if 'cur' in locals() and cur:
        cur.close()
    if 'conn' in locals() and conn:
        conn.close()

What actually slows it down

Moving your database to a cloud provider changes the network physics. Latency from on-prem or desktop to cloud can easily be tens of milliseconds due to geography, firewall layers, isolated switching, and inspection. That is orders of magnitude higher than a LAN’s single-digit millisecond latency.

The critical detail is how LOBs work over the wire. When a LOB is involved, the client does not receive or send the full value as a regular scalar in a single call. Instead, a LOB locator is exchanged, and additional calls are made to transmit or read the LOB content. Even if the LOB is small, there is still a per-row round trip to process it. With thousands of rows, the accumulated per-row latency dominates wall-clock time. That’s why local runs seem fine while cloud-hosted databases look painfully slow.

LOB datatypes require special processing by the client driver. The driver issues additional LOB calls per row, and high latency is paid on every one of those calls.

On top of the LOB behavior, using user-defined object types introduces more round trips than working with plain scalars, which compounds the effect.

How to approach the problem

The most effective way to reduce time is to avoid LOB processing across high-latency links whenever you can. If the textual content is always smaller than 4 KB, make it a scalar and eliminate LOB semantics entirely by using varchar2(4000). If that is not possible at the schema level, cast as you read or write.

If values are usually short but only occasionally exceed 4 KB, split your work into two passes. Fetch or process the varchar2-cast rows first and handle the longer tail separately where LOB processing is absolutely necessary.

If only the first portion of the content is needed, truncate on the server side during access. This keeps data in scalar form for the transfer.

-- Keep everything scalar when values are <= 4000 bytes
SELECT CAST(mylob AS varchar2(4000)) AS mylob, col2, col3 FROM your_table;

-- If you only need the beginning, truncate during access
SELECT CAST(SUBSTR(mylob, 1, 4000) AS varchar2(4000)) AS mylob, col2, col3 FROM your_table;

When you are updating rather than selecting, the same latency costs apply to LOBs. The practical workaround is the same: use varchar2(4000) when the data fits, or otherwise look for ways to reduce how many rows require LOB handling.

A faster path for loading data

Objects and LOB locators add round trips. Binding scalars directly and batching statements generally yields better throughput. Instead of constructing 40,000 object instances, push rows with a plain NUMBER and CLOB pair via batch insert. This avoids object overhead and takes advantage of the driver’s batching pipeline.

-- Table with scalar columns
CREATE TABLE DOCS_TBL (
  DOC_ID   NUMBER,
  DOC_BODY CLOB
);

import oracledb

# assume conn is open
cur = conn.cursor()
rows = [(i, f"Name_{i}" * 4000) for i in range(40000)]
cur.executemany(
    "insert into DOCS_TBL (DOC_ID, DOC_BODY) values (:1, :2)",
    rows
)
conn.commit()

This approach aligns with python-oracledb tuning guidance that it may be faster to avoid objects and that using LOB locators is slower than binding as string/buffer. It also lets the driver and database optimize batching in fewer, larger network operations.

Driver mode considerations

If you are using python-oracledb in Thick mode, consider testing Thin mode. In scenarios like this you might see marginal improvements, and for fetching objects Thin mode is particularly advantageous. The article “Python-oracledb Thin mode Object performance” offers a good perspective and numbers on object performance in Thin mode. If the code path depends on objects and LOBs, however, the core latency characteristics of LOB processing still apply.

When you can’t avoid LOB

If the majority of values legitimately exceed 4 KB and full fidelity is required, then the most impactful lever is to reduce the number of round trips. Restructure processing to avoid full extracts when incremental pulls are sufficient, or perform operations closer to the data to minimize client-server chatter. When the content is actually structured data, normalize it into scalar columns and child tables rather than storing it as unstructured CLOB; this turns LOBs into standard types and often removes the LOB path altogether.

Why this matters

The difference between sub-second and multi-hour runs often comes down to per-row round-trip costs multiplied by tens of thousands of rows. LOBs transform what looks like a single update into many network calls, and high-latency networks make every one of those calls expensive. Recognizing when your design implicitly triggers LOB processing lets you choose alternatives that keep data in scalar form or cut round trips dramatically.

Closing advice

Start by validating the latency profile between your client and the cloud database; the behavior you see is consistent with high-latency links. Prefer varchar2(4000) for short text to stay on scalar paths. Where you must handle LOBs, narrow the set of rows that require LOB semantics by casting, truncating, or splitting the workload. Avoid user-defined objects in hot paths and batch with executemany when inserting. If feasible in your environment, test Thin mode for incremental benefits. These steps target the real cause—the round-trip pattern of LOB processing over higher-latency networks—and restore runtimes to something closer to your local experience.

The article is based on a question from StackOverflow by omkar kothiwale and an answer by Paul W.

oracle-database python python-oracledb