2025, Dec 17 07:00
Fixing Python stdout on Windows: prevent ANSI files from redirected output and enforce UTF-8
Why Python Unicode decodes correctly but redirected stdout on Windows writes ANSI files. Force UTF-8 with -X utf8, PYTHONUTF8, or PYTHONIOENCODING. Use in CI
Why does a Unicode string that decodes fine in Python end up as ANSI in a Windows command prompt when you redirect output to a file? The symptom is confusing: the bytes decode to the right character, yet Notepad++ reports the file as ANSI and the glyph can be wrong outside of Western locales. Let’s break down what actually happens with stdout on Windows and how to force consistent UTF-8 output.
Reproducing the issue
The sequence below decodes the UTF-8 bytes for the character Ö and redirects the output to a file:
>python --version
Python 3.13.0
>python -c "print(b'\xc3\x96'.decode('utf-8'))" > test.txt
Opening test.txt in Notepad++ shows the encoding as ANSI. Running a similar command in MSYS2 (Python 3.11.6) yields a UTF-8-encoded file instead. The discrepancy is environmental, not about decode.
What’s actually happening
The core point is that decode produces a Unicode string. At that moment there is no file encoding involved at all. The following two one-liners are equivalent in effect:
python -c "print(b'\xc3\x96'.decode('utf-8'))" > test.txt
python -c "print('Ö')" > test.txt
Encoding only comes into play when print writes to stdout. On Windows, when stdout is redirected to a file, the OS chooses the default “ANSI” code page of that localized Windows installation. For US and Western European systems, that means Windows-1252. That’s why the resulting file is marked as ANSI and not UTF-8. Different environments, like MSYS2, configure stdout differently and can default to UTF-8, hence the different outcome.
How to get UTF‑8 output
Enable UTF-8 Mode in Python so stdout uses UTF-8 even when redirected. You can turn it on per process with the -X utf8 switch:
python -X utf8 -c "print('Ö')" > test.txt
Alternatively, set the environment variable PYTHONUTF8=1 to enable UTF-8 Mode for Python. If you need fine-grained control specifically over redirected stdin/stdout/stderr, PYTHONIOENCODING lets you override those streams explicitly.
Another reliable option is to avoid relying on the OS redirection path and write to a file object whose encoding you control. This way you feed print a stream that you opened with UTF-8:
data_bytes = b'\xc3\x96'
text_unicode = data_bytes.decode('utf-8')
with open('test.txt', 'w', encoding='utf-8') as out_file:
print(text_unicode, file=out_file)
This produces a UTF-8 file regardless of the console or OS redirection behavior, because the encoding is specified on the Python side.
Why this matters
Console pipelines and CI logs often redirect stdout to files. If the encoding silently falls back to an ANSI code page, non-ASCII characters can get corrupted or misinterpreted downstream. Predictable UTF-8 output prevents those mismatches across different shells and environments. If you package your code into an executable and need consistent behavior, you can embed the UTF-8 Mode switch by adding the option 'X utf8' to the .spec file.
Takeaways
Decoding bytes to Unicode in Python works exactly as expected; the surprise comes later when stdout is encoded. On Windows, a redirected stdout defaults to the localized “ANSI” encoding, commonly Windows-1252 in Western locales. Make the encoding explicit if you care about the bytes written: enable UTF-8 Mode with -X utf8 or PYTHONUTF8, use PYTHONIOENCODING for redirected I/O, or print into a file opened with encoding='utf-8'. Doing so keeps your outputs portable and your text intact.