2025, Nov 21 15:00
Fix Python file I/O cursor bugs: stop re-seeking, normalize newlines, and count lines correctly
Learn how to fix Python file I/O cursor bugs: avoid seek(0) loops, handle CRLF vs LF newlines, use read-until logic, and get accurate line counts with EOFError.
Building a small helper over Python file I/O is a common way to learn about cursors, reads, and line boundaries. But there’s a subtle pitfall: if you re-seek to the beginning on every operation, you’ll keep rereading the same content and inflate your line counts. Below is a minimal case of how that happens and how to fix it without introducing external modules or changing the core idea.
Reproducing the issue
The example below writes two lines into a file and then tries to iterate line by line until it hits EOFError. The class keeps two file handles, one for writing and one for reading, and maintains a shared cursor.
def run():
f = QuickIO("numbers.txt")
f.put("1, -2, 5, 0, 19, -7, end\n5, 5, -1, -10, 9, end")
rows = 0
while(True):
try:
print("Line #"+str(rows+1))
f.jump_to_line(rows)
except EOFError:
break
rows += 1
print("The amount of lines are: "+str(rows + 1))
class QuickIO:
# members: path, out, inp, pos
def __init__(self, path):
self.path = path
self.pos = 0
self.out = open(path, "w")
self.inp = open(path, "r")
self.out.seek(self.pos)
self.inp.seek(self.pos)
def jump_to(self, char_idx):
self.pos = char_idx
self.out.seek(char_idx)
self.inp.seek(char_idx)
tmp = open(self.path, "r")
tmp.seek(char_idx)
if tmp.read(1) == "":
tmp.seek(char_idx - 1)
if tmp.read(1) == "":
tmp.close()
raise EOFError
tmp.close()
def jump_to_line(self, n):
self.jump_to(0)
for i in range(0, n):
print(repr(self.read_until()))
def put(self, s):
self.out.write(s)
self.out.flush()
self.jump_to(self.pos + len(s))
def read_until(self, token="\n"):
data = self.inp.read()
self.jump_to(self.pos)
end = 0
while len(data) > end:
if data[end:end+len(token)] == token:
break
if len(data[end:end+len(token)]) != len(token):
self.jump_to(self.pos + len(data))
return data
end += 1
self.jump_to(self.pos + end + len(token))
return data[0:end]
run()
What actually goes wrong
The repeated lines and wrong totals stem from resetting the file position. The method that is supposed to move to a specific line calls a hard reset: jump_to(0). That means every iteration starts from the beginning and discards the same content again. As a result, the loop never advances through the file the way you expect, and the output repeats the first line while the counter drifts out of sync.
There’s also a platform-sensitive wrinkle: when you write text with newlines on Windows, the sequence can become \r\n. If your logic assumes a plain \n delimiter but the underlying file contains \r\n, you can observe “skipped” lines or mismatches. Opening both reader and writer with newline="\n" keeps the content consistent with the delimiter you search for.
Fixing the logic
The first change is to stop seeking to the start of the file on every line jump. Read lines relative to where the cursor currently is. The second change is to make read_until raise EOFError when nothing more can be read, advance the cursor when a delimiter is found, and otherwise move to the end. Finally, normalize line endings by setting newline="\n" on both the writer and the reader.
class QuickIO:
# members: path, out, inp, pos
def __init__(self, path):
self.path = path
self.pos = 0
self.out = open(path, "w", newline="\n")
self.inp = open(path, "r", newline="\n")
self.out.seek(self.pos)
self.inp.seek(self.pos)
def jump_to(self, char_idx):
self.pos = char_idx
self.out.seek(char_idx)
self.inp.seek(char_idx)
tmp = open(self.path, "r")
tmp.seek(char_idx)
if tmp.read(1) == "":
tmp.seek(char_idx - 1)
if tmp.read(1) == "":
tmp.close()
raise EOFError
tmp.close()
def jump_to_line(self, n):
for _ in range(n):
self.read_until('\n')
return self.read_until('\n')
def put(self, s):
self.out.write(s)
self.out.flush()
self.jump_to(self.pos + len(s))
def read_until(self, token="\n"):
data = self.inp.read()
if data == "":
raise EOFError
self.jump_to(self.pos)
end = 0
while len(data) > end:
if data[end:end+len(token)] == token:
self.jump_to(self.pos + end + len(token))
return data[0:end]
end += 1
self.jump_to(self.pos + len(data))
return data[0:end]
Drive it like this, asking for the “next” line each time by passing 0:
def run():
f = QuickIO("numbers.txt")
f.put("1, -2, 5, 0, 19, -7, end\n5, 5, -1, -10, 9, end")
line_count = 0
while True:
try:
print(f"Line #{line_count + 1}")
print(repr(f.jump_to_line(0)))
except EOFError:
break
line_count += 1
print("The amount of lines are:", line_count)
run()
This yields the expected, non-duplicated result and the accurate line count.
Why this matters
File iteration correctness depends on one invariant: the read position must move forward deterministically. Any unconditional seek to the beginning in a per-line operation breaks that invariant and leads to rereads and inflated counters. Consistent newline handling is equally important; searching for \n while the file contains \r\n introduces off-by-one behavior and “missing” delimiters that are hard to spot.
Takeaways
Advance relative to the current cursor instead of re-seeking to zero, normalize newline behavior by opening both ends with newline="\n", and have your read-until logic either return the segment and advance past the delimiter or raise EOFError if nothing remains. If you later decide to streamline this further, you can also open the file once in r+ mode and rely on readline(), but the adjustments above are enough to make the current approach correct and predictable.