2025, Sep 24 03:00

How to Control a Child PTY in Python and Maintain a Queryable Virtual Terminal Screen Buffer

Learn how to run apps in a Python PTY, interpret ANSI terminal sequences, and build a virtual screen buffer you can query by row and column for testing.

Controlling an interactive program through a child PTY and querying the on-screen state at arbitrary positions sounds straightforward until you try to do it in Python. The goal is clear: start a subprocess inside a pseudo-terminal, read what it prints as a terminal would, send keystrokes back, and at any moment be able to ask, “what character is at row r, column c?” The tricky part is that terminals do not behave like simple streams of text. They are stateful devices driven by control sequences, and that changes everything.

Problem setup

Consider the desired flow simplified into a short snippet. The intent is to launch a program in a PTY, look at a specific screen cell, inject some input, and see that the screen cell updates accordingly.

# conceptual usage
session = create_pty()
session.launch(app_to_run)

session.peek_cell(2, 4)  # expects "@"
session.feed_keys("A")
session.peek_cell(2, 4)  # expects "!" after the input modifies the screen

What actually stands in the way

There is nothing in the standard library that will do this. Operating system PTYs provide byte streams. They do not know what a terminal type is; that knowledge lives on the other side of the connection, in a physical terminal or a terminal emulator such as xterm, putty, MacTerm, cmd, or PowerShell. To keep track of what is currently “on the screen,” you must supply that missing piece yourself.

To get there, the subprogram must run with an appropriate terminal type in its environment. Once that is set, its output still needs to be interpreted according to that terminal type. That means translating terminal command codes into cursor movement, character drawing, erasing, and scrolling, and then maintaining an in-memory screen model that mirrors what a real terminal would display. This is a non-trivial task.

Solution outline

The practical approach has two parts. First, start the child process inside a PTY and make sure it sees the terminal type you intend. Second, capture its output and interpret it according to that terminal type, maintaining your own virtual screen buffer that you can query by coordinates. Writing to the PTY sends input to the child process, just like typing into a terminal window. You might want to look into the curses library when designing terminal interactions, but the core requirement remains: you need to interpret the subprogram’s output to track the screen state.

import os
import pty


def run_in_pty(cmd, argv):
    master_fd, slave_fd = pty.openpty()
    pid = os.fork()
    if pid == 0:
        # child process
        env = os.environ.copy()
        env["TERM"] = "set-proper-term"  # ensure an appropriate terminal type
        os.setsid()
        os.dup2(slave_fd, 0)
        os.dup2(slave_fd, 1)
        os.dup2(slave_fd, 2)
        os.close(master_fd)
        os.close(slave_fd)
        os.execvpe(cmd, argv, env)
    else:
        # parent process
        os.close(slave_fd)
        return pid, master_fd


class DisplayModel:
    def __init__(self, rows=24, cols=80):
        self.rows = rows
        self.cols = cols
        self.grid = [[" "] * cols for _ in range(rows)]
        # a real implementation would also track cursor and scroll state

    def ingest(self, data_bytes):
        # Interpret data_bytes according to the terminal type.
        # Translate control sequences into cursor moves, writes and scrolling.
        # Left intentionally non-implemented; this is the hard part.
        pass

    def glyph_at(self, r, c):
        return self.grid[r][c]


def send_keys(fd, text):
    os.write(fd, text.encode())


# example wiring (incomplete; reading loop omitted):
# pid, pty_fd = run_in_pty("my_program", ["my_program"]) 
# view = DisplayModel()
# ... read() from pty_fd and call view.ingest(...) repeatedly ...
# send_keys(pty_fd, "A")
# cell = view.glyph_at(2, 4)

Why this matters

Automation, scraping, and testing of terminal applications often require asserting on-screen state after every keystroke. Without interpreting the output according to the chosen terminal type, any attempt to “read a specific character at row and column” will be unreliable. The underlying PTY does not keep a screen; it only carries bytes. If you need a screen, you must build it from those bytes as a terminal emulator would.

Conclusion

If your goal is to query a child PTY like a 2D grid with random access, plan for two responsibilities: make the child see an appropriate terminal type and implement a layer that interprets its output into a virtual screen. There is no one-call standard library feature for this, and the interpretation logic is the core complexity. If you work on terminal UIs, it can be helpful to explore the curses library for interaction patterns, but the essential requirement remains the same: you must translate terminal command codes into a faithful screen model before you can reliably answer questions like “what character is at row r, column c?”

The article is based on a question from StackOverflow by GenTel and an answer by J Earls.