2025, Sep 30 15:00

Parallelizing IO-bound loops in Python asyncio with create_task, gather, and a Semaphore to limit concurrency to four

Learn how to speed up IO-bound loops in Python asyncio using create_task, gather, and a Semaphore to cap concurrency at four network requests safely and fast.

Parallelizing IO-bound loops in asyncio is often a low-effort, high-impact optimization. When a coroutine awaits each operation in a tight loop, it processes one item at a time by design. If the task is network-bound, this leaves a lot of throughput on the table. Below is a focused walkthrough on how to batch requests for multiple coordinates concurrently and how to cap the concurrency to four in-flight requests.

Baseline: one-by-one awaits

The starting point processes one coordinate per iteration and saves the result to a dictionary. It uses aiohttp and a find_panorama_async call to fetch panorama metadata. The logic is straightforward, but the loop awaits every call before moving to the next.

import asyncio
from aiohttp import ClientSession
from tqdm.asyncio import tqdm
import pandas as pd
from streetlevel import streetview


geo_map = {
    1: {'country': 'AZE', 'longitude': 48.84328388521507, 'latitude': 39.75349231850633},
    2: {'country': 'AZE', 'longitude': 46.42568983461019, 'latitude': 41.74686028542068},
    3: {'country': 'AZE', 'longitude': 46.77391084838791, 'latitude': 41.05880534123746},
    4: {'country': 'AZE', 'longitude': 46.57032078287734, 'latitude': 39.312871420751485},
    5: {'country': 'AZE', 'longitude': 47.26319069021316, 'latitude': 41.28950907274436},
    6: {'country': 'AZE', 'longitude': 46.49956247696345, 'latitude': 40.96062005058899},
    7: {'country': 'AZE', 'longitude': 48.291171317357815, 'latitude': 38.939108897065445},
    8: {'country': 'AZE', 'longitude': 47.12081533506723, 'latitude': 39.681052295694656},
}


async def scan_points(geo_map):
    pano_map = {}

    async with ClientSession() as http:
        for ident, meta in tqdm(geo_map.items(), desc="Parsing coordinates", unit="coordinate"):
            snap = await streetview.find_panorama_async(meta['latitude'], meta['longitude'], http)

            if snap is not None:
                pano_map[ident] = {
                    'longitude': snap.lon,
                    'latitude': snap.lat,
                    'date': str(snap.date)
                }
            else:
                pano_map[ident] = {
                    'longitude': None,
                    'latitude': None,
                    'date': None
                }

    return pano_map


async def run_pipeline(geo_map):
    outcome = await scan_points(geo_map)

    frame = pd.DataFrame.from_dict(outcome, orient='index')
    frame.reset_index(inplace=True)
    frame.rename(columns={'index': 'uuid'}, inplace=True)
    frame.to_parquet('results.parquet', index=False)


asyncio.run(run_pipeline(geo_map))

What slows it down

await blocks the coroutine until the awaited operation completes. Used inside the loop, it prevents other network calls from starting until the current one finishes. This is correct but inherently sequential. To improve throughput, the idea is to start multiple coroutines and await their results later.

Launching many, awaiting later with create_task

One way to parallelize IO is to schedule many coroutines using create_task and await them afterward. This starts them immediately and lets the event loop interleave their progress.

async def scan_points_parallel_fire_and_wait(geo_map):
    results_map = {}

    async with ClientSession() as http:
        pending = []

        # kick off all tasks right away
        for ident, meta in tqdm(geo_map.items(), desc="Parsing coordinates", unit="coordinate"):
            fut = asyncio.create_task(
                streetview.find_panorama_async(meta['latitude'], meta['longitude'], http)
            )
            pending.append(fut)

        # later, await each task's result
        for idx, fut in enumerate(pending):
            snap = await fut
            if snap is not None:
                results_map[idx] = {
                    'longitude': snap.lon,
                    'latitude': snap.lat,
                    'date': str(snap.date)
                }
            else:
                results_map[idx] = {
                    'longitude': None,
                    'latitude': None,
                    'date': None
                }

    return results_map

This approach radically increases the rate shown by tqdm. It is worth noting that the displayed speed reflects the loop’s iteration pace and not the complete end-to-end runtime. Still, the difference is visible: from 27.03 coordinate/s to six figures in the progress bar in one of the runs.

Starting together with gather

You can also collect coroutines into a list without create_task and start them all at once with asyncio.gather, which returns results in the same order.

async def scan_points_parallel_gather(geo_map):
    results_map = {}

    async with ClientSession() as http:
        batch = []

        # prepare all coroutines
        for ident, meta in tqdm(geo_map.items(), desc="Parsing coordinates", unit="coordinate"):
            coro = streetview.find_panorama_async(meta['latitude'], meta['longitude'], http)
            batch.append(coro)

        # run all at once and collect results
        payloads = await asyncio.gather(*batch)

        for idx, snap in enumerate(payloads):
            if snap is not None:
                results_map[idx] = {
                    'longitude': snap.lon,
                    'latitude': snap.lat,
                    'date': str(snap.date)
                }
            else:
                results_map[idx] = {
                    'longitude': None,
                    'latitude': None,
                    'date': None
                }

    return results_map

In another run, the reported rate jumped even higher. Again, the counter is not a total-runtime metric, but it does show how much idle time is avoided by overlapping IO.

Limiting concurrency to four with Semaphore

Unbounded concurrency is not always desirable. To keep only four requests in flight, wrap each coroutine with a semaphore. This gates the execution and ensures at most a fixed number of tasks run concurrently.

async def within_limit(lock, label, job):
    async with lock:
        return await job

async def scan_points_bounded(geo_map, max_parallel=4):
    results_map = {}

    async with ClientSession() as http:
        token = asyncio.Semaphore(max_parallel)
        batch = []

        for ident, meta in tqdm(geo_map.items(), desc="Parsing coordinates", unit="coordinate"):
            job = within_limit(
                token,
                ident,
                streetview.find_panorama_async(meta['latitude'], meta['longitude'], http)
            )
            batch.append(job)

        final = await asyncio.gather(*batch)

        for idx, snap in enumerate(final):
            if snap is not None:
                results_map[idx] = {
                    'longitude': snap.lon,
                    'latitude': snap.lat,
                    'date': str(snap.date)
                }
            else:
                results_map[idx] = {
                    'longitude': None,
                    'latitude': None,
                    'date': None
                }

    return results_map

This pattern keeps concurrency under control and answers the request to run exactly four coordinate lookups in parallel. For visibility during experiments, an artificial sleep can be inserted, but it is not required for actual execution.

Ordering vs. completion

When ordering matters, asyncio.gather preserves the original order of the submitted coroutines. When you want to process results as soon as any of them finishes, asyncio.as_completed yields results in completion order instead. Choose the mode that aligns with how you want to assemble the output. For progress tracking, it is also possible to integrate gather with tqdm in an async-friendly way.

Why this matters

Await-in-loop patterns serialize IO and underutilize concurrent capacity. By launching many network-bound operations together, you remove idle gaps and cut wall-clock time. In practice, even a simple switch to create_task or gather makes the progress counter skyrocket, which is a clear signal that work is being overlapped rather than queued up. Adding a semaphore lets you keep external services happy and regulate load without rewriting the rest of the pipeline.

Takeaways

Use await sparingly inside loops that call IO. If you need parallel requests, either schedule coroutines with create_task and await them later, or collect them and start all at once with gather. When you need a hard cap on parallelism, wrap the calls with a Semaphore set to the desired width, such as four. If you care about result order, gather keeps it; if you care about fastest-first handling, as_completed is the right tool. These patterns compose cleanly with your existing session and DataFrame steps and deliver immediate, observable speed-ups.

The article is based on a question from StackOverflow by Daniel AG and an answer by furas.