2025, Dec 28 01:00
Why sed Fails with PCRE-Style Regex on Windows and How to Quote URLs with Perl or GNU sed
Learn why sed on Windows fails with PCRE-style regex and how to quote space-separated URLs reliably using a Perl one-liner or GNU sed. Avoid POSIX pitfalls.
Quoting every whitespace-separated URL in a long line is a trivial task in Python or PCRE-style regex. Trying to do the same with sed on Windows, however, can go sideways if you expect the same regex features. Here’s a concise breakdown of what went wrong, why it happened, and how to fix it without wrestling with POSIX quirks.
Reproducing the issue
The goal is to wrap each URL in double quotes. The input is a single line of space-delimited URLs.
https://www.youtube.com/watch?app=desktop&v=Ot34P0yyQqI&t=984s https://www.youtube.com/watch?v=vviniZjvDQs https://www.youtube.com/watch?v=Ih7qgkyo_oo https://www.youtube.com/watch?v=X6UEDpwI3HI https://www.youtube.com/watch?v=nShgaRMNlLw https://www.youtube.com/watch?v=nd_jN-C_Juw https://www.youtube.com/watch?v=aOtqox2uB3YA sed attempt, in the same spirit as a Python regex that uses a non-greedy quantifier, might look like this:
sed 's/\(https.*?\)[:space:]/"\1"/g' urls.txtExpected output:
"https://www.youtube.com/watch?app=desktop&v=Ot34P0yyQqI&t=984s" "https://www.youtube.com/watch?v=vviniZjvDQs" "https://www.youtube.com/watch?v=Ih7qgkyo_oo" "https://www.youtube.com/watch?v=X6UEDpwI3HI" "https://www.youtube.com/watch?v=nShgaRMNlLw" "https://www.youtube.com/watch?v=nd_jN-C_Juw" "https://www.youtube.com/watch?v=aOtqox2uB3Y"Actual output:
"https://www.youtube.com/watch?"pp=desktop&v=Ot34P0yyQqI&t=984s https://www.youtube.com/watch?v=vviniZjvDQs https://www.youtube.com/watch?v=Ih7qgkyo_oo https://www.youtube.com/watch?v=X6UEDpwI3HI https://www.youtube.com/watch?v=nShgaRMNlLw https://www.youtube.com/watch?v=nd_jN-C_Juw https://www.youtube.com/watch?v=aOtqox2uB3YWhat’s actually going wrong
The environment is Microsoft Windows 11 with sed installed via winget (bmatzelle.Gow). sed uses POSIX regular expressions by default, not the PCRE/Python flavor. Two details from the pattern cause the breakage:
First, the non-greedy quantifier idea from PCRE/Python (.*?) simply doesn’t exist in sed’s basic regex. In this context, the question mark is treated as a literal character. That means the capture group matches from https up to the first literal ? in the URL, which is exactly the one after watch. That explains why only https://www.youtube.com/watch? was quoted and the rest spilled outside the quotes in the actual output.
Second, the [:space:] token is not recognized as written. In POSIX syntax, character classes like space must be placed inside a bracket expression. Writing [:space:] by itself does not work the way a PCRE user might expect.
Practical fix without wrestling with POSIX regex
If you want the same behavior you get in Python or PCRE without re-thinking patterns for every tool, use Perl’s regex engine. On Windows this is also convenient through WSL.
You can install Perl inside Windows Subsystem for Linux like this: install WSL, run bash from a command prompt, then install Perl with apt.
wslsudo apt install perlNow process the file with a one-liner that quotes each non-space token and preserves spacing:
perl -pe 's/(\S+)( |$)+/"$1" /g' urls.txtOutput:
"https://www.youtube.com/watch?app=desktop&v=Ot34P0yyQqI&t=984s" "https://www.youtube.com/watch?v=vviniZjvDQs" "https://www.youtube.com/watch?v=Ih7qgkyo_oo" "https://www.youtube.com/watch?v=X6UEDpwI3HI" "https://www.youtube.com/watch?v=nShgaRMNlLw" "https://www.youtube.com/watch?v=nd_jN-C_Juw" "https://www.youtube.com/watch?v=aOtqox2uB3Y"If you need to overwrite the original file in-place:
perl -i -pe 's/(\S+)( |$)+/"$1" /g' urls.txtDuring testing it’s safer to write to a new file and keep the original intact:
perl -pe 's/(\S+)( |$)+/"$1" /g' urls.txt > urls_updated.txtAlternative if you are on GNU sed
If you do have GNU sed 4.7, a pattern that targets URLs specifically can quote them as well:
sed 's#\(https\?://[^ ]\+\)#"\1"#g' urls.txtThis avoids non-greedy quantifiers and matches a URL-like token without relying on PCRE features.
Why this matters
PCRE-style regular expressions are portable across many tools and languages, so you don’t end up rewriting patterns for every environment. When your workflow lives on Windows and you need robust command-line text processing, having Perl available—natively or via WSL—saves time and prevents subtle regex mismatches like the one above.
Takeaways
Expecting PCRE/Python semantics in sed leads to surprises: non-greedy quantifiers don’t exist in POSIX basic regex, and POSIX character classes must be used in the correct bracket form. If you want the behavior you’re used to from Python, run a Perl one-liner; with WSL, that’s a minimal lift and gives you consistent results. If GNU sed is available, a carefully crafted POSIX pattern can work too, but it’s still a different regex dialect with different rules.