2025, Dec 20 05:00

Normalize Python stdout newlines across platforms: fix CRLF vs LF with subprocess text=True or newline-safe stdout

Learn why Python prints CRLF on Windows, breaking tests that expect LF. See fixes using subprocess.run(text=True) or a TextIOWrapper to make stdout portable.

Turning an English–Latin word list into a Latin–English dictionary sounds straightforward until an OS sneaks in an invisible character and breaks your tests. The task itself is simple: read one or more text files where each line looks like “english - latin1, latin2, ...”, invert the mapping, deduplicate translations across files, and print the result. The twist is that the test harness captures stdout and compares it against an expected file using Unix newlines. On Windows, that means your perfectly valid output will differ by a carriage return.

The task, the expected output, and where things go sideways

Input files look like this:

apple - malum, pomum, popula
fruit - baca, bacca, popum
punishment - malum, multa

And the required output is:

baca - fruit
bacca - fruit
malum - apple, punishment
multa - punishment
pomum - apple
popula - apple
popum - fruit

Yet on Windows, a typical print with “\n” becomes “\r\n” on stdout, and a direct string comparison against a file containing “\n” fails. You can see it in a minimal case that reproduces the problem:

import sys

for src_path in sys.argv[1:]:
    chunks = ['Hello, ', 'World!']
    print('\n'.join(chunks))

In a test that reads stdout as bytes and decodes, the observed output contains “\r\n”, while the expected content uses just “\n”, leading to a mismatch.

Why it happens

On Windows, the default TextIOWrapper for sys.stdout translates “\n” to “\r\n” when writing to the console. The test in question launches the script via subprocess, captures stdout with a pipe, then does a raw decode and compares it against a file read as text. The key detail is that the test doesn’t normalize line endings before comparing, so you end up with “\r\n” versus “\n”. That makes the test non-portable across operating systems.

The fix that belongs in the test

The robust solution is to let subprocess perform the text decoding and newline handling. With text=True, subprocess.run returns a text string and normalizes line endings for you, so the decoded output matches how the expected file is read. Decoding manually becomes unnecessary.

import os
import subprocess

run_info = subprocess.run(
    ["python", os.path.join(SOLUTION_FOLDER_PATH, "task3.py"), test_input_path],
    stdout=subprocess.PIPE,
    text=True
)
actual_text = run_info.stdout.strip()

with open(expected_path, "r") as fh:
    expected_text = fh.read().strip()

assert actual_text == expected_text

This keeps the comparison stable across platforms and removes the brittle “bytes then decode” sequence.

If you can’t change the test: neutralize newline translation in your script

When the test cannot be modified, you can adjust stdout in your script so that printing “\n” writes “\n” literally (not “\r\n”). Wrap sys.stdout with a TextIOWrapper that disables newline translation. Place this immediately after your imports.

import io
import sys

sys.stdout = io.TextIOWrapper(sys.stdout.buffer, newline='')

Be aware that some IDEs redirect sys.stdout in a way that doesn’t expose .buffer or doesn’t use the standard TextIOWrapper. In a plain command-line Python on Windows, the approach above works as intended. In certain IDEs it may not.

The dictionary script with the newline-safe stdout

The following snippet reads files passed via argv, inverts the English–Latin mapping to Latin–English, sorts the keys, and prints the lines joined by “\n”. Newline handling is stabilized at the stdout layer as shown above. The program logic remains the same.

import io
import sys

sys.stdout = io.TextIOWrapper(sys.stdout.buffer, newline='')

for input_path in sys.argv[1:]:
    reversed_map = {}
    with open(input_path, 'r', encoding='utf-8') as handle:
        for raw in handle.readlines():
            eng_term = raw.split()[0]
            latin_parts = raw.strip().replace(',', '').split()[2:]
            for lt in latin_parts:
                if lt in reversed_map:
                    reversed_map[lt].append(eng_term)
                else:
                    reversed_map.setdefault(lt, [eng_term])

    lines_out = []
    for lexeme, eng_list in sorted(reversed_map.items()):
        lines_out.append(lexeme + ' - ' + ', '.join(eng_list))

    print('\n'.join(lines_out))

Why this matters for your projects and your CI

Line endings are a classic portability pitfall. When stdout is compared byte-for-byte against a golden file, platform-specific newline translation will cause false negatives. Tweaking editor settings or replacing characters after the fact doesn’t help because the translation happens at the I/O layer. The right place to fix it is where decoding and newline normalization occur: in the test via text=True, or in the writer by disabling translation on stdout when you must match a specific convention.

When expected results are read from files, ensure both sides of the comparison are normalized consistently. If you control the harness, read the expected content in text mode and compare it to a text-mode capture from subprocess. If you only control the script, re-wrapping stdout as shown above is an effective workaround as long as your environment exposes sys.stdout.buffer.

Takeaways

Prefer text=True in subprocess.run and compare text to text; this keeps tests portable and removes the need for manual decode. If you can’t change the harness, re-wrap sys.stdout to prevent “\n” from becoming “\r\n”. Be cautious in IDEs that replace sys.stdout with custom objects. And when working with reference files, keep line endings consistent on both sides of any comparison to avoid failing tests for the wrong reason.