https://pytroubles.com/en/posts/id2869-get-remote-video-duration-with-ffprobe-reliable-url-method-without-downloading-in-python

Get Remote Video Duration with ffprobe: Reliable URL Method Without Downloading in Python

Get Remote Video Duration with ffprobe by Reading the URL Directly (No Full Download)

Get Remote Video Duration with ffprobe: Reliable URL Method Without Downloading in Python

Get duration from a remote video URL with ffprobe without downloading the file. Python examples. Avoid HTTP stream errors by letting ffprobe read the URL.

2026-01-03T21:00:12+03:00

2026-01-03T21:00:13+03:00

Extracting the duration of a remote video sounds trivial if you already have a working pipeline for local files. But the moment you switch from disk I/O to an HTTP stream, things may fall apart with a blunt “Invalid data found when processing input” from ffprobe. Below is a concise walkthrough of the pitfall and a reliable way to get duration from a URL without downloading the whole file.Problem statementLocally, ffprobe can be invoked from Python and its XML output parsed to obtain the duration. For example:import xml.etree.ElementTree as et, subprocess as sp def duration_from_path(file_path): xml_text = sp.run([ "ffprobe", "-i", file_path, "-show_format", "-output_format", "xml" ], stdout=sp.PIPE, stderr=sp.STDOUT).stdout.decode() return et.fromstring(xml_text).find("format").get("duration") def duration_from_stream(stream_obj): xml_text = sp.run([ "ffprobe", "-i", "pipe:0", "-show_format", "-output_format", "xml" ], stdin=stream_obj, stdout=sp.PIPE, stderr=sp.STDOUT).stdout.decode() return et.fromstring(xml_text).find("format").get("duration") def duration_from_bytes(blob): xml_text = sp.run([ "ffprobe", "-i", "pipe:0", "-show_format", "-output_format", "xml" ], input=blob, stdout=sp.PIPE, stderr=sp.STDOUT).stdout.decode() return et.fromstring(xml_text).find("format").get("duration") Fetching an HTTP response as a file-like stream is also straightforward:import requests as http def stream_from_url(address): resp = http.get(address, stream=True) return resp.raw However, combining both approaches by piping resp.raw directly to ffprobe fails with “Invalid data found when processing input”. Interestingly, disabling seeking on a regular local file still works, indicating that seeking is not the culprit. Reading the entire remote file into memory and then passing it via stdin also works:def duration_via_download(address): data = stream_from_url(address).read() return duration_from_bytes(data) But fully downloading large media is impractical.What’s really going onThe observed behavior shows that passing a raw HTTP stream to ffprobe can produce invalid input errors, while the same data, once fully read into memory, parses fine. This is not about seeking on local files. The reliable path is to let ffprobe fetch the URL itself instead of proxying the stream through Python.Additionally, it’s worth validating that the URL points to raw media data rather than an HTML page. If you see HTML when inspecting the response body, it’s not the correct media endpoint.Solution: let ffprobe read the URL directlyffprobe can accept a URL as input and return just the duration, without extra chatter:ffprobe https://commondatastorage.googleapis.com/gtv-videos-bucket/sample/BigBuckBunny.mp4 \ -v quiet \ -show_entries format=duration \ -output_format default=noprint_wrappers=1:nokey=1 Result:596.474195 -v quiet removes program and library info. -show_entries format=duration limits output to duration. -output_format default=noprint_wrappers=1:nokey=1 strips wrappers and the key name, returning only the value.If you also need other values (for example, size), use format=duration,size. Each value will be printed on a separate line:596.474195 158008374 There is also -hide_banner if you prefer to hide only part of the banner instead of using -v quiet.Python wrapper for the same approachIf you prefer to stay in Python, call ffprobe with the URL directly and read its stdout:import subprocess as sp def duration_from_url(address): out = sp.run([ "ffprobe", address, "-v", "quiet", "-show_entries", "format=duration", "-output_format", "default=noprint_wrappers=1:nokey=1" ], stdout=sp.PIPE, stderr=sp.STDOUT).stdout.decode().strip() return out This returns the duration in seconds as a string, without downloading the full file into your process.Why this mattersIn production systems—transcoders, catalogers, or media validators—efficiency and robustness matter. Streaming remote media through your application just to probe metadata is fragile and wasteful. Offloading URL fetching to ffprobe avoids unnecessary memory usage and network handling complexity in your code and sidesteps issues stemming from piping raw HTTP streams.TakeawaysUse ffprobe with the remote URL as input when you need duration or other container-level metadata from distant files. Keep the output minimal with -v quiet and -show_entries, and strip wrappers with -output_format default=noprint_wrappers=1:nokey=1. If multiple values are needed, list them in format=..., and they will appear on separate lines. Finally, make sure the URL points to the media file itself, not an HTML page.

ffprobe, remote video duration, get duration from URL, without downloading, Python, HTTP stream errors, probe media metadata, show_entries format=duration, video metadata, CLI

2026

2026, Jan 03 21:00

Get Remote Video Duration with ffprobe by Reading the URL Directly (No Full Download)

Get duration from a remote video URL with ffprobe without downloading the file. Python examples. Avoid HTTP stream errors by letting ffprobe read the URL.

Problem statement

Locally, ffprobe can be invoked from Python and its XML output parsed to obtain the duration. For example:

import xml.etree.ElementTree as et, subprocess as sp
def duration_from_path(file_path):
    xml_text = sp.run([
        "ffprobe",
        "-i", file_path,
        "-show_format", "-output_format", "xml"
    ], stdout=sp.PIPE, stderr=sp.STDOUT).stdout.decode()
    return et.fromstring(xml_text).find("format").get("duration")
def duration_from_stream(stream_obj):
    xml_text = sp.run([
        "ffprobe",
        "-i", "pipe:0",
        "-show_format", "-output_format", "xml"
    ], stdin=stream_obj, stdout=sp.PIPE, stderr=sp.STDOUT).stdout.decode()
    return et.fromstring(xml_text).find("format").get("duration")
def duration_from_bytes(blob):
    xml_text = sp.run([
        "ffprobe",
        "-i", "pipe:0",
        "-show_format", "-output_format", "xml"
    ], input=blob, stdout=sp.PIPE, stderr=sp.STDOUT).stdout.decode()
    return et.fromstring(xml_text).find("format").get("duration")

Fetching an HTTP response as a file-like stream is also straightforward:

import requests as http
def stream_from_url(address):
    resp = http.get(address, stream=True)
    return resp.raw

However, combining both approaches by piping resp.raw directly to ffprobe fails with “Invalid data found when processing input”. Interestingly, disabling seeking on a regular local file still works, indicating that seeking is not the culprit. Reading the entire remote file into memory and then passing it via stdin also works:

def duration_via_download(address):
    data = stream_from_url(address).read()
    return duration_from_bytes(data)

But fully downloading large media is impractical.

What’s really going on

The observed behavior shows that passing a raw HTTP stream to ffprobe can produce invalid input errors, while the same data, once fully read into memory, parses fine. This is not about seeking on local files. The reliable path is to let ffprobe fetch the URL itself instead of proxying the stream through Python.

Additionally, it’s worth validating that the URL points to raw media data rather than an HTML page. If you see HTML when inspecting the response body, it’s not the correct media endpoint.

Solution: let ffprobe read the URL directly

ffprobe can accept a URL as input and return just the duration, without extra chatter:

ffprobe https://commondatastorage.googleapis.com/gtv-videos-bucket/sample/BigBuckBunny.mp4  \
-v quiet \
-show_entries format=duration \
-output_format default=noprint_wrappers=1:nokey=1

Result:

596.474195

-v quiet removes program and library info. -show_entries format=duration limits output to duration. -output_format default=noprint_wrappers=1:nokey=1 strips wrappers and the key name, returning only the value.

If you also need other values (for example, size), use format=duration,size. Each value will be printed on a separate line:

596.474195
158008374

There is also -hide_banner if you prefer to hide only part of the banner instead of using -v quiet.

Python wrapper for the same approach

If you prefer to stay in Python, call ffprobe with the URL directly and read its stdout:

import subprocess as sp
def duration_from_url(address):
    out = sp.run([
        "ffprobe", address,
        "-v", "quiet",
        "-show_entries", "format=duration",
        "-output_format", "default=noprint_wrappers=1:nokey=1"
    ], stdout=sp.PIPE, stderr=sp.STDOUT).stdout.decode().strip()
    return out

This returns the duration in seconds as a string, without downloading the full file into your process.

Why this matters

In production systems—transcoders, catalogers, or media validators—efficiency and robustness matter. Streaming remote media through your application just to probe metadata is fragile and wasteful. Offloading URL fetching to ffprobe avoids unnecessary memory usage and network handling complexity in your code and sidesteps issues stemming from piping raw HTTP streams.

Takeaways

Use ffprobe with the remote URL as input when you need duration or other container-level metadata from distant files. Keep the output minimal with -v quiet and -show_entries, and strip wrappers with -output_format default=noprint_wrappers=1:nokey=1. If multiple values are needed, list them in format=..., and they will appear on separate lines. Finally, make sure the URL points to the media file itself, not an HTML page.

ffprobe file-descriptor python python-requests video-streaming