2025, Dec 23 01:00

Stop infinite relaunches in Azure Container Apps Jobs with Event Hubs: fix checkpointStrategy metadata casing (blobMetadata)

Fix Azure Container Apps Jobs that restart with no events: mis-cased checkpointStrategy in Event Hubs trigger metadata. Works with blobMetadata checkpoints.

Azure Container Apps Jobs are convenient for event-driven workloads, but a subtle configuration pitfall can turn a clean pipeline into an infinite loop. If your job is triggered by Azure Event Hubs with blobMetadata as the checkpoint strategy, updates the checkpoint store as expected, and still relaunches immediately after finishing with no new events to process, you’re likely facing a metadata key casing issue.

Problem setup

The job is configured with an Event Hubs trigger. The trigger scales based on unprocessed events, stores checkpoints in Blob Storage, and the Python worker consumes events using azure-eventhub-checkpointstoreblob-aio and azure-identity. Despite correct checkpoint updates and no new traffic, the job finishes and then starts again right away, endlessly.

Configuration that reproduces the issue

The trigger configuration looks correct at first glance, but one field name is off by case:

eventTriggerConfig: {
  parallelism: 1
  replicaCompletionCount: 1
  scale: {
    rules: [
      {
        name: 'eh-trigger'
        type: 'azure-eventhub'
        auth: [
          {
            secretRef: 'eh-conn-str'
            triggerParameter: 'connection'
          }
          {
            secretRef: 'sa-conn-str'
            triggerParameter: 'storageConnection'
          }
        ]
        metadata: {
          blobContainer: contBucket
          checkPointStrategy: 'blobMetadata'
          consumerGroup: cgName
          eventHubName: hubName
          connectionFromEnv: 'EVENT_HUB_CONNECTION_STRING'
          storageConnectionFromEnv: 'STORAGE_ACCOUNT_CONNECTION_STRING'
          activationUnprocessedEventThreshold: 1
          unprocessedEventThreshold: 5
        }
      }
    ]
  }
}

Worker that consumes events

The Python job receives events and updates checkpoints. Dependencies: azure-eventhub-checkpointstoreblob-aio 1.2.0 and azure-identity 1.21.0.

import asyncio
from datetime import datetime, timedelta, timezone
import logging
import os
from azure.eventhub.aio import EventHubConsumerClient
from azure.eventhub.extensions.checkpointstoreblobaio import BlobCheckpointStore
from azure.identity.aio import DefaultAzureCredential
STORAGE_URL = os.getenv("BLOB_STORAGE_ACCOUNT_URL")
STORAGE_CONTAINER = os.getenv("BLOB_CONTAINER_NAME")
EH_NAMESPACE = os.getenv("EVENT_HUB_FULLY_QUALIFIED_NAMESPACE")
EH_NAME = os.getenv("EVENT_HUB_NAME")
EH_CONSUMER = os.getenv("EVENT_HUB_CONSUMER_GROUP")
hub_logger = logging.getLogger("azure.eventhub")
logging.basicConfig(level=logging.INFO)
hub_credential = DefaultAzureCredential()
last_seen_at = None
IDLE_WINDOW = timedelta(seconds=30)
async def handle_event(part_ctx, evt):
    global last_seen_at
    if evt is not None:
        print(
            'Got event: "{}" from partition: "{}"'.format(
                evt.body_as_str(encoding="UTF-8"), part_ctx.partition_id
            )
        )
    else:
        print(f"Got None from partition: {part_ctx.partition_id}")
    last_seen_at = datetime.now(timezone.utc)
    await part_ctx.update_checkpoint(evt)
async def pump():
    global last_seen_at
    store = BlobCheckpointStore(
        blob_account_url=STORAGE_URL,
        container_name=STORAGE_CONTAINER,
        credential=hub_credential,
    )
    client = EventHubConsumerClient(
        fully_qualified_namespace=EH_NAMESPACE,
        eventhub_name=EH_NAME,
        consumer_group=EH_CONSUMER,
        checkpoint_store=store,
        credential=hub_credential,
    )
    last_seen_at = datetime.now(timezone.utc)
    async with client:
        recv_task = asyncio.create_task(
            client.receive(
                on_event=handle_event,
                starting_position="-1",
            )
        )
        while True:
            await asyncio.sleep(1)
            if datetime.now(timezone.utc) - last_seen_at > IDLE_WINDOW:
                break
        await client.close()
        recv_task.cancel()
        try:
            await recv_task
        except asyncio.CancelledError:
            pass
        await hub_credential.close()
def main():
    loop = asyncio.get_event_loop()
    loop.run_until_complete(pump())

What’s actually wrong

The root cause is the casing of the metadata key for the checkpoint strategy. The field was specified as checkPointStrategy, while the expected key is checkpointStrategy. With the incorrect casing, the trigger configuration does not honor the intended checkpoint behavior, and the job re-triggers continuously even when there are no new events. This also explains why the checkpoint store appears to be updated correctly from the code’s perspective, yet the orchestration still decides to fire a new run immediately after completion.

An alternative idea that sometimes comes up is to use max_wait_time in client.receive() to detect idle periods. That option does not address the underlying trigger problem and does not cause the job to exit by itself; additional exit logic would still be required and it won’t stop the unnecessary rescheduling on its own.

Fix: correct the metadata key

Use the proper casing for the checkpoint strategy in the trigger metadata. The following configuration resolved the loop behavior:

metadata: {
  blobContainer: containerName
  checkpointStrategy: 'blobMetadata'
  consumerGroup: eventHubConsumerGroupName
  eventHubName: eventHubName
  connectionFromEnv: 'EVENT_HUB_CONNECTION_STRING'
  storageConnectionFromEnv: 'STORAGE_ACCOUNT_CONNECTION_STRING'
  activationUnprocessedEventThreshold: '0'
  unprocessedEventThreshold: '5'
}

A working example project is available here: https://github.com/ganhammar/azure-container-apps-job-with-event-hub-integration

Why this matters

Event-driven jobs depend heavily on precise trigger configuration. A single mis-cased key can negate your checkpoint strategy and create wasteful, confusing behavior: jobs relaunching without new events, noisy logs, and unnecessary compute churn. In tightly controlled production environments, such loops can impact costs and downstream systems, and they are easy to miss because the consumer code and checkpoint store appear healthy.

Takeaways

Always validate trigger metadata keys exactly as expected by the platform. If a job with Event Hubs and blobMetadata keeps re-running with no new messages, double-check the checkpointStrategy field. Keep in mind that consumer-side options like max_wait_time are useful for handling inactivity inside the process, but they won’t fix a misconfigured trigger. With the corrected metadata, the job respects checkpoints and only starts when there is real work to process.