2025, Nov 11 01:00
Throttle globally, sequence locally: a shared asyncio rate limiter for 10 req/s with aiohttp and dependent HTTP chains
Learn how to cap aiohttp calls at 10 req/s in Python asyncio: use a shared rate limiter to run parallel a→b→c chains, keep order per chain, avoid blocking
When you fan out a matrix of dependent HTTP calls, the obvious desire is to parallelize across categories while keeping strict a → b → c ordering inside each category. The catch is rate control: the server allows up to 10 requests per second with ~0.15 s per response. Sleeping inside individual coroutines doesn’t guarantee a global cap; blocking sleeps defeat concurrency. The pattern that consistently works in asyncio is a shared rate-limiter guarding every outbound request.
Problem setup
Suppose each category k has up to three steps, and later steps depend on earlier ones. If the calls were independent, you might schedule them with gaps:
gap_per_call = 0.1
async with aiohttp.ClientSession() as http:
async with asyncio.TaskGroup() as group:
for cat in [1, 2, 3]:
group.create_task(grab_a_payload(cat, http))
await asyncio.sleep(gap_per_call)
group.create_task(grab_b_payload(cat, http))
await asyncio.sleep(gap_per_call)
group.create_task(grab_c_payload(cat, http))
await asyncio.sleep(gap_per_call)
But a → b → c must be sequential per category. A natural refactor is to move the logic into one coroutine and schedule those per category:
gap_per_call = 0.1
async with aiohttp.ClientSession() as http:
async with asyncio.TaskGroup() as group:
for cat in [1, 2, 3]:
group.create_task(run_abc_chain(cat, http))
await asyncio.sleep(gap_per_call)
At this point the question is how to cap the overall rate at 10 requests per second while allowing each category’s chain to advance immediately after its own response is ready.
Why naive sleeps don’t solve it
Placing await asyncio.sleep before each request inside the chain does not throttle the whole program. Other tasks continue issuing requests, so the aggregate rate can exceed the limit. Using time.sleep would slow down the send rate, but it blocks the event loop. That forces 1a, 2a, 3a to go out before you even process the result of 1a, delaying b and c and pushing downstream work by a large margin.
Solution: one shared rate-limiter around every HTTP call
The reliable approach is a single global limiter that every HTTP call must acquire before sending. Each category’s a → b → c remains a plain await chain, so dependencies are respected. All chains run concurrently, but the limiter keeps the aggregate under 10 requests per second.
import asyncio, aiohttp
from aiolimiter import AsyncLimiter # pip install aiolimiter
rate_gate = AsyncLimiter(10, 1) # 10 requests every 1 s
async def pull_json(path: str, client: aiohttp.ClientSession):
# waits if 10 calls already happened within the last second
async with rate_gate:
async with client.get(path) as resp:
resp.raise_for_status()
return await resp.json()
async def run_chain(cat: int, client: aiohttp.ClientSession):
data_a = await pull_json(f"/data/{cat}a", client)
if want_b(data_a): # your predicate
data_b = await pull_json(f"/data/{cat}b", client)
if want_c(data_a, data_b):
data_c = await pull_json(f"/data/{cat}c", client)
# ... process (data_a, data_b, data_c) ...
async def bootstrap():
send_gap = 0.1 # optional pacing between task submissions
async with aiohttp.ClientSession() as client:
async with asyncio.TaskGroup() as grp:
for cat in (1, 2, 3): # thousands scale fine
grp.create_task(run_chain(cat, client))
await asyncio.sleep(send_gap) # helps avoid the second kind of blocking noted below
asyncio.run(bootstrap())
The chain run_chain preserves the dependency order through ordinary awaits, so b and c only happen if and when a says they should. Since every pull_json call acquires the AsyncLimiter first, all categories combined stay below the 10 requests per second ceiling.
If you prefer no third-party library, you can start a background task that refills an asyncio.Semaphore(10) once per second; the overall idea remains the same.
A note on pacing task submission
To avoid the second kind of blocking described earlier, it is helpful to add a small await asyncio.sleep immediately after scheduling each category task. This plays well with FIFO queueing in asyncio and aiolimiter and keeps progress smooth across chains.
Why this matters
In workloads with tens of thousands of categories, letting each chain progress as soon as its own responses arrive shortens end-to-end latency for downstream processing. At the same time, a shared limiter prevents accidental overload and keeps you aligned with a 10 req/s contract. The combination unlocks the potential 90% runtime reduction from parallelism without sacrificing ordering guarantees.
Conclusion
Throttle globally, sequence locally. Wrap every HTTP call in a shared rate-limiter to enforce 10 requests per second across the whole program, keep a → b → c as straightforward awaits inside each category, and optionally introduce a tiny pause after submitting each task to smooth scheduling. With this structure, asyncio gives you parallel chains, strict per-chain ordering, and predictable request pacing.
The article is based on a question from StackOverflow by SapereAude and an answer by Dmitry543.