2025, Nov 24 05:00
Resolving intermittent 'Container not found' GSException in GridDB Cloud with retry and exponential backoff
Intermittent 'Container not found' in GridDB Cloud using griddb_python? Understand metadata propagation GSException and fix with short wait, retries, backoff.
Intermittent “Container not found” errors right after creating a container in GridDB Cloud can derail otherwise straightforward ingestion scripts. If you use griddb_python to create a TIME_SERIES container and immediately read it back, you may see occasional GSException with a message that the container does not exist, even though it shows up in the Web UI. This becomes more noticeable under concurrency.
Reproducing the issue
The following minimal sequence creates a container named device_metrics and then tries to obtain a handle to it:
import griddb
# Assume grid_store is a valid GSStore instance
grid_store.put_container(
"device_metrics",
[
("timestamp", griddb.Type.TIMESTAMP),
("device_id", griddb.Type.STRING),
("temperature", griddb.Type.FLOAT)
],
griddb.ContainerType.TIME_SERIES,
True
)
metrics_ts = grid_store.get_container("device_metrics")
In some runs, especially when multiple threads are active, the last line can raise:
griddb.gsexception.GSException: [PARTIAL_EXECUTION(0x0303)] Container not found. (Container name='device_metrics')
What’s going on
The behavior is consistent with a short metadata propagation window after the container is created. The container exists, but a subsequent get_container or get_container_info may momentarily fail while the system converges. Because this is intermittent and more frequent with concurrent scripts, the safest approach is to handle the brief gap explicitly.
Practical fix
The most direct way to avoid the race is to add a small delay after creation before retrieving the container handle:
import time
import griddb
# Assume gs_handle is a valid GSStore instance
gs_handle.put_container(
"device_metrics",
[
("timestamp", griddb.Type.TIMESTAMP),
("device_id", griddb.Type.STRING),
("temperature", griddb.Type.FLOAT)
],
griddb.ContainerType.TIME_SERIES,
True
)
# Brief wait to allow metadata to settle
time.sleep(1)
metrics_container = gs_handle.get_container("device_metrics")
If you want something more robust, use a bounded retry with exponential backoff. This avoids hardcoding a single fixed delay and copes better with occasional spikes in propagation time:
import time
# Assume db_store is a valid GSStore instance
def poll_for_container(db_store, cont_name, attempts=5, base_delay=0.5):
for idx in range(attempts):
ref = db_store.get_container(cont_name)
if ref is not None:
return ref
time.sleep(base_delay * (2 ** idx))
raise RuntimeError(f"Container '{cont_name}' not found after retries.")
metrics_ref = poll_for_container(db_store, "device_metrics")
Why this matters
Data pipelines, provisioning scripts, and tests often create containers on the fly and immediately operate on them. Without a small wait or a retry policy, you introduce flaky behavior that’s hard to diagnose, especially under parallel execution. Handling the short propagation interval turns a brittle workflow into a predictable one.
Takeaways
If you create a container in GridDB Cloud and need to access it right away, account for a brief propagation window. A short sleep is sufficient for simple cases; a bounded exponential backoff is a safer default for threaded or automated runs. Keeping this in place will suppress spurious GSException errors, stabilize your ingestion scripts, and make multi-threaded workloads behave consistently.