2025, Oct 04 17:00
Regex to Decrement Consecutive Newlines by One (Python), with Examples and Caveats
Normalize text by reducing each run of newlines by exactly one using regex lookahead in Python. Avoid collapse-to-one mistakes and handle edge cases safely.
Text normalization often looks trivial until a tiny rule changes the shape of the task. A common requirement is to reduce any sequence of newline characters by exactly one, not to collapse them entirely. That subtle “x newlines into x−1 newlines” constraint is easy to miss if you reach for a standard collapse-to-one approach.
Problem statement
Given a string with multiple newline runs, the goal is to subtract one newline from every run. For example, starting with:
"Anna lives in Latin America.\n\nShe loves the vibes from the cities\n and the good weather.\n\n\nAnna is great"
The expected result is:
"Anna lives in Latin America.\nShe loves the vibes from the cities and the good weather.\n\nAnna is great"
Naive attempt that changes the semantics
It’s tempting to simplify multiple newlines to a single one. That is not equivalent to shaving exactly one newline from each run. Consider this implementation:
import re
def shrink_lines(blob):
    blob = re.sub(r'\n{2,}', '\n', blob)
    return blob
sample = "Anna lives in Latin America.\n\nShe loves the vibes from the cities\n and the good weather.\n\n\nAnna is great"
result = shrink_lines(sample)
This produces:
"Anna lives in Latin America.\nShe loves the vibes from the cities\n and the good weather.\nAnna is great"
which is not the target. The triple newline before “Anna is great” collapses to a single newline, effectively removing two newlines instead of exactly one.
Why the mismatch happens
The pattern \n{2,} targets any sequence of two or more newlines and replaces it with a single newline. That means the longer the run, the more newlines get removed. In other words, it normalizes, but it does not decrement. The requirement, however, is to remove exactly one newline per consecutive run, including single newlines, so that every run becomes one newline shorter.
Solution: remove only the final newline in each run
The precise way to subtract one newline from each consecutive sequence is to match a newline that is not followed by another newline and delete it. A lookahead does exactly that:
import re
def trim_one_break(segment):
    return re.sub(r'\n(?!\n)', '', segment)
text_in = "Anna lives in Latin America.\n\nShe loves the vibes from the cities\n and the good weather.\n\n\nAnna is great"
text_out = trim_one_break(text_in)
This approach finds the last newline in every contiguous group and removes it. Single newlines are also reduced to zero, double newlines become single, and triples become doubles—precisely the x → x−1 behavior.
Important caveat
Decrementing by one works for contiguous newline runs only. If whitespace appears between newlines, they are not a single run anymore. For example, the sequence "This is line 1\n \nThis is line 2" becomes "This is line 1 This is line 2" after applying \n(?!\n). Each newline is treated independently because the space breaks the adjacency, so both are removed and the net effect is a drop by two.
Why this matters
Text pipelines often rely on predictable spacing semantics. A reduction rule like “turn every run of newlines into one less” preserves relative paragraph structure while tightening layout consistently across the document. Using a collapse-to-one approach silently changes that structure and may erase meaningful spacing.
Takeaways
If the goal is to reduce consecutive newline sequences by exactly one, match the final newline in each run and remove it with a lookahead. Keep in mind that any spaces or other characters between newline characters break the run and lead to a different outcome. Align the regex with the exact definition of a “sequence” in your data and review edge cases where whitespace may appear between line breaks.
The article is based on a question from StackOverflow by Alexis and an answer by Chris Maurer.