https://pytroubles.com/en/posts/id2064-fix-intermittent-container-not-found-in-griddb-cloud-griddb-python-use-retries-and-backoff

Fix intermittent 'Container not found' in GridDB Cloud (griddb_python): use retries and backoff

Resolving intermittent 'Container not found' GSException in GridDB Cloud with retry and exponential backoff

Fix intermittent 'Container not found' in GridDB Cloud (griddb_python): use retries and backoff

Intermittent 'Container not found' in GridDB Cloud using griddb_python? Understand metadata propagation GSException and fix with short wait, retries, backoff.

2025-11-24T05:00:10+03:00

2025-11-24T05:00:11+03:00

Intermittent “Container not found” errors right after creating a container in GridDB Cloud can derail otherwise straightforward ingestion scripts. If you use griddb_python to create a TIME_SERIES container and immediately read it back, you may see occasional GSException with a message that the container does not exist, even though it shows up in the Web UI. This becomes more noticeable under concurrency.Reproducing the issueThe following minimal sequence creates a container named device_metrics and then tries to obtain a handle to it:import griddb # Assume grid_store is a valid GSStore instance grid_store.put_container( "device_metrics", [ ("timestamp", griddb.Type.TIMESTAMP), ("device_id", griddb.Type.STRING), ("temperature", griddb.Type.FLOAT) ], griddb.ContainerType.TIME_SERIES, True ) metrics_ts = grid_store.get_container("device_metrics") In some runs, especially when multiple threads are active, the last line can raise:griddb.gsexception.GSException: [PARTIAL_EXECUTION(0x0303)] Container not found. (Container name='device_metrics') What’s going onThe behavior is consistent with a short metadata propagation window after the container is created. The container exists, but a subsequent get_container or get_container_info may momentarily fail while the system converges. Because this is intermittent and more frequent with concurrent scripts, the safest approach is to handle the brief gap explicitly.Practical fixThe most direct way to avoid the race is to add a small delay after creation before retrieving the container handle:import time import griddb # Assume gs_handle is a valid GSStore instance gs_handle.put_container( "device_metrics", [ ("timestamp", griddb.Type.TIMESTAMP), ("device_id", griddb.Type.STRING), ("temperature", griddb.Type.FLOAT) ], griddb.ContainerType.TIME_SERIES, True ) # Brief wait to allow metadata to settle time.sleep(1) metrics_container = gs_handle.get_container("device_metrics") If you want something more robust, use a bounded retry with exponential backoff. This avoids hardcoding a single fixed delay and copes better with occasional spikes in propagation time:import time # Assume db_store is a valid GSStore instance def poll_for_container(db_store, cont_name, attempts=5, base_delay=0.5): for idx in range(attempts): ref = db_store.get_container(cont_name) if ref is not None: return ref time.sleep(base_delay * (2 ** idx)) raise RuntimeError(f"Container '{cont_name}' not found after retries.") metrics_ref = poll_for_container(db_store, "device_metrics") Why this mattersData pipelines, provisioning scripts, and tests often create containers on the fly and immediately operate on them. Without a small wait or a retry policy, you introduce flaky behavior that’s hard to diagnose, especially under parallel execution. Handling the short propagation interval turns a brittle workflow into a predictable one.TakeawaysIf you create a container in GridDB Cloud and need to access it right away, account for a brief propagation window. A short sleep is sufficient for simple cases; a bounded exponential backoff is a safer default for threaded or automated runs. Keeping this in place will suppress spurious GSException errors, stabilize your ingestion scripts, and make multi-threaded workloads behave consistently.

GridDB Cloud, griddb_python, Container not found, GSException, TIME_SERIES, metadata propagation, retry, exponential backoff, concurrency, ingestion scripts, get_container, race condition

2025

2025, Nov 24 05:00

Resolving intermittent 'Container not found' GSException in GridDB Cloud with retry and exponential backoff

Intermittent 'Container not found' in GridDB Cloud using griddb_python? Understand metadata propagation GSException and fix with short wait, retries, backoff.

Reproducing the issue

The following minimal sequence creates a container named device_metrics and then tries to obtain a handle to it:

import griddb
# Assume grid_store is a valid GSStore instance
grid_store.put_container(
    "device_metrics",
    [
        ("timestamp", griddb.Type.TIMESTAMP),
        ("device_id", griddb.Type.STRING),
        ("temperature", griddb.Type.FLOAT)
    ],
    griddb.ContainerType.TIME_SERIES,
    True
)
metrics_ts = grid_store.get_container("device_metrics")

In some runs, especially when multiple threads are active, the last line can raise:

griddb.gsexception.GSException: [PARTIAL_EXECUTION(0x0303)] Container not found. (Container name='device_metrics')

What’s going on

The behavior is consistent with a short metadata propagation window after the container is created. The container exists, but a subsequent get_container or get_container_info may momentarily fail while the system converges. Because this is intermittent and more frequent with concurrent scripts, the safest approach is to handle the brief gap explicitly.

Practical fix

The most direct way to avoid the race is to add a small delay after creation before retrieving the container handle:

import time
import griddb
# Assume gs_handle is a valid GSStore instance
gs_handle.put_container(
    "device_metrics",
    [
        ("timestamp", griddb.Type.TIMESTAMP),
        ("device_id", griddb.Type.STRING),
        ("temperature", griddb.Type.FLOAT)
    ],
    griddb.ContainerType.TIME_SERIES,
    True
)
# Brief wait to allow metadata to settle
time.sleep(1)
metrics_container = gs_handle.get_container("device_metrics")

If you want something more robust, use a bounded retry with exponential backoff. This avoids hardcoding a single fixed delay and copes better with occasional spikes in propagation time:

import time
# Assume db_store is a valid GSStore instance
def poll_for_container(db_store, cont_name, attempts=5, base_delay=0.5):
    for idx in range(attempts):
        ref = db_store.get_container(cont_name)
        if ref is not None:
            return ref
        time.sleep(base_delay * (2 ** idx))
    raise RuntimeError(f"Container '{cont_name}' not found after retries.")
metrics_ref = poll_for_container(db_store, "device_metrics")

Why this matters

Data pipelines, provisioning scripts, and tests often create containers on the fly and immediately operate on them. Without a small wait or a retry policy, you introduce flaky behavior that’s hard to diagnose, especially under parallel execution. Handling the short propagation interval turns a brittle workflow into a predictable one.

Takeaways

If you create a container in GridDB Cloud and need to access it right away, account for a brief propagation window. A short sleep is sufficient for simple cases; a bounded exponential backoff is a safer default for threaded or automated runs. Keeping this in place will suppress spurious GSException errors, stabilize your ingestion scripts, and make multi-threaded workloads behave consistently.

griddb python