2025, Dec 10 19:00

How to Fix OCR on Low-Contrast Game Panels: Channel Masking, Morphology, and Per-Row Tesseract

Turn noisy UI screenshots into clean text. Learn an OpenCV pipeline: channel masking, horizontal dilation, contour rows, then run Tesseract (PSM 6) for fast OCR.

OCR on noisy UI screenshots can be deceptively hard. Even when text looks readable to the human eye, a generic binarization step can leave Tesseract without the structure it needs. Below is a compact walkthrough that turns a low-contrast, colored game panel into clean, per-row text with Tesseract using a simple, repeatable pre-processing pipeline in OpenCV.

What goes wrong with a naïve threshold

The first instinct is usually to convert to grayscale, invert for contrast, and run an adaptive threshold before calling Tesseract. That’s reasonable, but in this case it adds complexity without improving the text signal. The background texture and color gradients survive thresholding, while characters fracture and lose their continuity, which makes OCR stumble.

Problem setup: the initial approach

The following minimal snippet shows the general pattern that doesn’t deliver strong OCR here. The program converts to grayscale, inverts, and applies adaptive thresholding before sending the result to Tesseract.

import cv2
import pytesseract

# read frame from disk or screen capture
frame = cv2.imread('input.png')

# grayscale then invert
img_gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
img_inv = cv2.bitwise_not(img_gray)

# optional fixed threshold for testing
# _, t_bin = cv2.threshold(img_inv, 95, 255, cv2.THRESH_BINARY)

# adaptive thresholding
bin_auto = cv2.adaptiveThreshold(
    img_inv,
    120,
    cv2.ADAPTIVE_THRESH_GAUSSIAN_C,
    cv2.THRESH_BINARY,
    11,
    2
)

# OCR with language switch
text_fr = pytesseract.image_to_string(bin_auto, lang='fra')
text_en = pytesseract.image_to_string(bin_auto, lang='eng')
print(text_fr)
print(text_en)

The output remains noisy and inconsistent. The text is partly recognizable to a person, but the binarized image has gaps and artifacts that are hostile to Tesseract’s layout analysis.

Why it fails here

The thresholding step isn’t isolating stable foreground regions. A grayscale conversion collapses color cues that are actually useful, and adaptive thresholding emphasizes local contrast but also amplifies background textures and edges. As a result, lines aren’t cohesive, contours are fragmented, and the OCR engine faces an image that doesn’t resemble clean, dark text on a bright background.

A focused pipeline that works

The fix is straightforward: reduce the problem to a clean mask from a single color channel, connect text horizontally with a narrow morphological dilation, detect each text row as a separate ROI, and OCR those rows individually. Adding a small border and a 2x resize stabilizes character shapes, while Tesseract’s page segmentation mode for a text block finishes the job.

import cv2
import numpy as np
import matplotlib.pyplot as plt
import pytesseract as tess

# path to the image
img_path = 'input.png'

# read source
src = cv2.imread(img_path)

# pick a stable channel (green channel)
chan_g = src[:, :, 1]

# simple mask instead of adaptive threshold
mask = np.zeros_like(chan_g)
mask[chan_g > 80] = 255

# connect characters horizontally
kernel = np.array([[0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0]], dtype=np.uint8)
mask_dilated = cv2.dilate(mask, kernel, iterations=10)

# detect external contours as row candidates
contours, _ = cv2.findContours(mask_dilated, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)

# sort by height of bounding box and keep top rows
contours = sorted(
    contours,
    key=lambda c: cv2.boundingRect(c)[-1],
    reverse=True
)[:7]

# visualize and OCR per row
fig = plt.figure()
fig.subplots_adjust(hspace=0.1)

rows_by_y = {}
for i, c in enumerate(contours, 1):
    x, y, w, h = cv2.boundingRect(c)

    # isolate row on the green channel and invert
    row = cv2.bitwise_not(chan_g[y:y+h, x:x+w])

    # add padding to stabilize OCR
    row_pad = cv2.copyMakeBorder(row, 10, 10, 10, 10, cv2.BORDER_CONSTANT, value=int(row[0, 0]))

    # upscale for better recognition
    row_up = cv2.resize(row_pad, None, fx=2.0, fy=2.0)

    # OCR the row as a block of text
    txt = tess.image_to_string(row_up, lang='fra', config='--psm 6 --oem 3').strip()
    rows_by_y[y] = txt

    # quick visual check
    ax = fig.add_subplot(1, len(contours), i)
    ax.imshow(row_up, cmap='gray')
    ax.axis('off')

# order lines by vertical position and print
ordered_lines = dict(sorted(rows_by_y.items())).values()
print(*ordered_lines, sep='\n')

This pipeline yields consistent lines like:

345 Vitalité

75 Intelligence

35 Sagesse

1 Portée

15 Dommages Feu

13 Tacle

7 Retrait PM

The overall idea is simple. A per-channel mask isolates the visible text from background clutter more reliably than adaptive thresholding in this specific case. The narrow, horizontal dilation bridges gaps between characters so the row becomes one contiguous region. Contour-based ROI extraction turns a busy panel into several tight crops, which simplifies OCR significantly. A small border and 2x resize avoid edge clipping and provide more pixels per glyph. Finally, page segmentation mode 6 tells Tesseract to expect a single block of text, which matches a cropped row.

Why this matters

OCR quality is often not about throwing heavier thresholds or more blur at the image. It’s about feeding the recognizer a deliberate structure: clean binary contrast, connected foreground, and semantically meaningful ROIs. That is what makes a character model robust. The same principle scales when you later add language models or a post-check with NLP to validate that the extracted words match an allowed vocabulary.

Practical notes and wrap-up

The exact channel, threshold, kernel shape, and the number of expected rows are parameters you may need to tune for your screen theme. Once dialed in, this approach is both fast and dependable. It’s also easy to integrate into a loop for a simple on-screen agent that reads a panel, parses a handful of known tokens and numbers, and takes action. If the downstream needs to verify correctness, add a lightweight text check with NLP or a dictionary of allowed terms. That way, occasional OCR slips can be handled without resorting to brittle pixel comparisons for every possible number.

The key takeaway is to focus on structure before recognition. Prefer a minimal mask over overfitting thresholds, connect text into meaningful regions, operate per row, and give Tesseract a problem it was built to solve.