2025, Nov 25 09:00
Concurrent multipart uploads to Amazon S3: speed up large MP4 ingestion with asyncio, httpx, and aioboto3
Speed up large MP4 transfers to Amazon S3 with asynchronous Python: stream via httpx and use concurrent multipart uploads with aioboto3 for real speed gains.
Moving multi-hundred-megabyte MP4s from external APIs into Amazon S3 quickly is an I/O-bound problem. Network round-trips, HTTP streaming, and S3 uploads add up, so a purely sequential flow makes each file take minutes. Asynchronous programming helps, but only if you create enough useful concurrency. Below is a concrete walkthrough of what happened when uploads looked “asynchronous” yet still behaved serially, and how concurrent multipart uploads improved wall-clock time.
Where the slowdown comes from
The initial approach streamed each source video with httpx and uploaded it to S3 via aioboto3 using multipart upload, but each part was sent one by one. That meant only one S3 part was in-flight at any given moment for a file, and across files it often looked like S3 processed them sequentially. Concurrency existed at the task level, yet inside each task the work was serialized by awaiting every part upload before starting the next.
Problem pattern: sequential multipart upload inside an async flow
Here’s a minimal version of the pattern where parts are uploaded in strict sequence. The programming logic is the same, but identifiers are renamed for clarity. This runs multiple files concurrently, yet each file’s parts are still sent one-at-a-time.
import asyncio
import httpx
from botocore.exceptions import ClientError
import aioboto3
from typing import AsyncIterator
async def push_chunks_async(byte_stream: AsyncIterator[bytes],
s3_object_key: str) -> None:
"""
Asynchronously uploads video files via aioboto3 using multipart upload,
but parts are awaited sequentially.
"""
aio_session = aioboto3.Session()
async with aio_session.client('s3') as s3_cli:
init_meta = await s3_cli.create_multipart_upload(Bucket=bucket_label, Key=s3_object_key)
multi_id = init_meta['UploadId']
assembled_parts = []
part_no = 1
try:
async for piece in byte_stream:
part_reply = await s3_cli.upload_part(
Bucket=bucket_label,
Key=s3_object_key,
PartNumber=part_no,
UploadId=multi_id,
Body=piece
)
assembled_parts.append({
'ETag': part_reply['ETag'],
'PartNumber': part_no
})
part_no += 1
print('Chunk done')
await s3_cli.complete_multipart_upload(
Bucket=bucket_label,
Key=s3_object_key,
UploadId=multi_id,
MultipartUpload={'Parts': assembled_parts}
)
print('Upload Done')
except Exception as err:
await s3_cli.abort_multipart_upload(
Bucket=bucket_label,
Key=s3_object_key,
UploadId=multi_id
)
print('An error occured', err)
async def fetch_and_ship(source_url):
async with httpx.AsyncClient(timeout=httpx.Timeout(20, read=10)) as net:
async with net.stream("GET", source_url) as resp:
if resp.status_code == 200:
try:
key_path = f'{dir_tag}/{file_tag}'
await push_chunks_async(resp.aiter_bytes(chunk_size=100*1024*1024), key_path)
except ClientError as e:
print('Upload Failed', e)
else:
print('Bad Response', resp.status_code)
async def driver():
endpoints = [
'https://api.example.com/somevideo.mp4',
'https://api.example.com/somevideo.mp4',
'https://api.example.com/somevideo.mp4'
]
jobs = [fetch_and_ship(u) for u in endpoints]
await asyncio.gather(*jobs)
if __name__ == '__main__':
bucket_label = 'Test-Bucket'
dir_tag = 'Test'
file_tag = 'Video.MP4'
asyncio.run(driver())
The essence of the issue
Async I/O only accelerates things if you have multiple operations in-flight. In the code above, each part upload is awaited before the next part is started, so a single file progresses serially. That’s why it can look like S3 is “doing one at a time,” even though your outer orchestration uses asyncio.gather for multiple URLs. With large parts, the cost of awaiting each upload dominates and the event loop can’t overlap S3 work for the same object.
For large objects, Amazon recommends multipart uploads when file sizes exceed 100 MB. That sets the stage for concurrency, but you still need to run multiple part uploads concurrently if you want to maximize throughput while the network is the limiting factor.
What actually helps: concurrent multipart part uploads
The improvement is to start each part upload as its own task and then await them all together. That way, parts of the same object are uploaded concurrently, and you keep concurrency between different files too. Below is the adjusted upload routine that creates tasks per part and gathers them before completing the multipart upload.
import asyncio
import aioboto3
from typing import AsyncIterator, Dict, Any
async def put_concurrent_parts(byte_source: AsyncIterator[bytes],
target_key: str) -> None:
"""
Asynchronously uploads video files via aioboto3
with concurrent multipart part uploads.
"""
aio_ses = aioboto3.Session()
async with aio_ses.client('s3') as s3c:
init_result = await s3c.create_multipart_upload(Bucket=bucket_label, Key=target_key)
upload_token = init_result['UploadId']
part_no = 1
pending = []
async def ship_piece(seq_no: int, payload: bytes) -> Dict[str, Any]:
reply = await s3c.upload_part(
Bucket=bucket_label,
Key=target_key,
PartNumber=seq_no,
UploadId=upload_token,
Body=payload
)
return {
'ETag': reply['ETag'],
'PartNumber': part_no
}
try:
part_no = 1
async for blob in byte_source:
job = asyncio.create_task(ship_piece(part_no, blob))
pending.append(job)
part_no += 1
collected = await asyncio.gather(*pending)
await s3c.complete_multipart_upload(
Bucket=bucket_label,
Key=target_key,
UploadId=upload_token,
MultipartUpload={'Parts': collected}
)
print('Upload Done')
except Exception as exc:
await s3c.abort_multipart_upload(
Bucket=bucket_label,
Key=target_key,
UploadId=upload_token
)
print('An error occurred', exc)
What to expect after enabling concurrency
With parts uploading concurrently, you’ve effectively maximized useful concurrency within a file and across files. At that point, total upload time is governed by your available network bandwidth and upstream latency. For multi-hundred-megabyte videos, a faster network path matters. Running the script from a cloud instance can provide the network capacity that a home router typically cannot, and that is where you will see the most noticeable gains after introducing concurrent multipart uploads.
Why this matters
Large-object ingestion pipelines live and die by I/O. If part uploads are serialized, your throughput is artificially capped no matter how “async” the rest of the code looks. Enabling concurrency at the multipart layer turns idle time into useful work and reduces per-file wall-clock time until the network becomes the limiting factor. That aligns with Amazon’s guidance to use multipart uploads for files over 100 MB and makes the overall behavior match the expectation of “multiple files being created at once and each one being loaded in” rather than appearing to process one at a time.
Takeaways
Stream data from the source, use multipart uploads for large files, and start multiple S3 part uploads concurrently to keep the event loop busy. Once you’ve done that, the remaining bottleneck is the network. If local uplink speed is the constraint, running the same code in a cloud environment is the practical lever to reduce end-to-end times.