2025, Nov 18 13:00

Prevent garbled Korean in yt-dlp on Windows: understand code page redirection and set --encoding or use the Python API

On Windows, redirection converts yt-dlp output to the console code page, losing Korean. Use --encoding (UTF-8 or cp949) or the Python API to preserve titles.

Unicode on Windows is still a minefield when your tools mix Korean and Latin text. A typical case: yt-dlp prints playlist titles correctly in the console, yet the moment you redirect that output to a file or capture it from Python, Hangul vanishes or degrades into placeholders. The key is understanding what Windows does during redirection and how to make yt-dlp speak the right encoding from the start.

Reproducing the issue

On Windows cmd you can list YouTube playlist titles with mixed Korean/Latin characters using yt-dlp. The titles render fine in the console when the code page is set to 949, but not when output is redirected.

yt-dlp --flat-playlist -i --print title PLySOINx0fqvYr6s8aGdqaK9j8_CAWcP5U

Redirect the very same output to a file and the Korean disappears:

yt-dlp --flat-playlist -i --print title PLySOINx0fqvYr6s8aGdqaK9j8_CAWcP5U > out.txt 2>&1

Capturing via Python shows the same behavior; characters are missing in the captured text even if they displayed in the console:

import os
import subprocess

charset = "cp949"
os.environ["PYTHONIOENCODING"] = charset

cmdline = "yt-dlp --flat-playlist -i --print title PLySOINx0fqvYr6s8aGdqaK9j8_CAWcP5U"
result = subprocess.run(cmdline.split(), capture_output=True, text=True, shell=False, encoding=charset)
print((result.stdout, result.stderr))

What actually happens

The redirection operator > is converting the text to the console's code page.

That conversion is precisely where characters get lost if the active code page cannot represent them. A minimal demonstration on wineconsole makes it clear:

Z:\home\lmc\tmp>chcp
Active code page: 437

Z:\home\lmc\tmp>echo 철갑혹성
철갑혹성

Z:\home\lmc\tmp>echo 철갑혹성 > k.out

Z:\home\lmc\tmp>type k.out
????

Z:\home\lmc\tmp>chcp 949
Active code page: 949

Z:\home\lmc\tmp>echo 철갑혹성 > k.out

Z:\home\lmc\tmp>type k.out
철갑혹성

When a code page cannot map certain characters, you get either question marks or outright removal depending on how the conversion handles unrepresentable symbols. There is no way to translate all cp949 characters to cp437. When a transliteration is attempted, you see replacements like question marks; when ignoring unmappable characters, they disappear, which matches what you observed.

It is also worth noting that this isn’t universal across platforms. On Linux Mint, writing to a file in the same scenario worked fine, and trying another terminal such as kitty, alacritty or warp may affect display behavior. The crux of the Windows problem remains the redirection-time conversion to the console’s code page.

Fix: tell yt-dlp which encoding to use

The most reliable approach is to make yt-dlp emit text in a known encoding that can represent your characters, and then write that text directly without going through a lossy conversion step. Using yt-dlp’s own encoding option removes ambiguity. Once set, it displays correctly in the terminal, in files, and when captured as a string, with both UTF-8 and cp949 working fine.

One way is to have Python write the redirected output while forcing yt-dlp’s encoding:

import subprocess

codec = "cp949"
outfile = open("out.txt", "wb")
subprocess.run(
    f"yt-dlp --encoding '{codec}' --flat-playlist -i --print title PLySOINx0fqvYr6s8aGdqaK9j8_CAWcP5U".split(),
    stdout=outfile,
    encoding=codec,
)

After that, with the console on code page 949, you can view the file and see the Korean titles correctly:

wineconsole 2>/dev/null
Microsoft Windows 10.0.2600

Z:\home\lmc\tmp>chcp 949
Active code page: 949

Z:\home\lmc\tmp>type out.txt
Vague (feat. Hey)
새벽 한 시
천 개의 태양
Wish
...

An even cleaner route is to skip subprocess entirely and use yt-dlp as a Python module, which avoids console redirection and its implicit transcoding. The library exposes options that mirror the CLI and lets you control encoding directly.

import yt_dlp

src_url = "https://www.youtube.com/playlist?list=PLySOINx0fqvYr6s8aGdqaK9j8_CAWcP5U"
codec = "cp949"
accum = ""

options = {
    "extract_flat": True,
    "playlist_items": "1-5",
    "encoding": codec,
}

with yt_dlp.YoutubeDL(options) as loader:
    info = loader.extract_info(src_url, download=False)
    for entry in info["entries"]:
        accum += f"{entry['title']}\n"

with open("out.txt", "wb") as handle:
    handle.write(accum.encode(codec))

Why this matters

Once output is redirected, Windows applies the console’s code page, and that conversion can be lossy for multilingual text. If the target code page cannot represent Korean, you either see replacement characters or the originals are dropped. That’s how you end up with printable titles in the console yet stripped titles in files or captured strings. Controlling the encoding at the source prevents this silent data loss and keeps automation workflows trustworthy.

Summary and practical advice

If you are redirecting or programmatically capturing yt-dlp output on Windows and need to preserve Korean alongside Latin text, make yt-dlp emit a suitable encoding explicitly with the --encoding option, then write that output directly. Using the Python API sidesteps console redirection altogether and keeps the data path consistent. Where available, viewing or processing the text in UTF-8 or cp949 avoids the mapping gaps that lead to question marks or dropped characters. On Linux Mint the same operation wrote correctly to a file; terminals differ, but the principle remains: control the encoding end to end and you won’t lose characters.