2025, Oct 15 22:00

Get Character-Level Boxes from EasyOCR: A Practical Guide Using Connected Components and OpenCV

Extract character-level bounding boxes from EasyOCR with OpenCV: binarization, connected components, and centerline scanning for per-character coordinates.

Character-level coordinates are a common requirement when you need more than word-level OCR: fine-grained selection, alignment, or post-processing all depend on precise boxes per glyph. EasyOCR returns polygonal boxes for words or phrases and a confidence value, but not per-character boxes. Here is how to bridge that gap using image processing on top of EasyOCR’s output.

Problem

EasyOCR returns one entry per detected text region that includes a quadrilateral bounding box, the recognized string, and a confidence score. A typical item looks like this:

[
    [
        [60, 88],
        [639, 88],
        [639, 124],
        [60, 124]
    ],
    "Some phrase",
    0.6820449765391986
]

The question is whether it is possible to obtain character-level boxes directly, without manually computing positions for each character inside the phrase.

Why the issue occurs

EasyOCR does not natively provide character-level coordinates. You receive a bounding shape per phrase, not per glyph. To derive character boxes, you need to work within each detected region and segment the contents yourself. A practical approach is to binarize the cropped region, find connected components, and select those components that intersect the text line’s horizontal mid-line (or baseline). The horizontal width can be taken from each component, while the vertical bounds can be inherited from the original EasyOCR region.

Solution

The workflow builds directly on the EasyOCR output: crop the region, convert it to a binary image, collect connected components, scan across the horizontal centerline to pick components belonging to the text, and assemble character boxes from those hits. The example image referenced for testing is taken from this page. There is also a relevant discussion here: github.com/JaidedAI/EasyOCR/issues/631.

import easyocr
import cv2
import numpy as np

# Run EasyOCR
ocr_engine = easyocr.Reader(['en'])
src_path   = 'testocr.png'
frame      = cv2.imread(src_path)
detections = ocr_engine.readtext(src_path)
annotated  = frame.copy()

# Iterate through OCR detections
for quad, snippet, score in detections:
    # 1) Crop the detected region
    qarr = np.array(quad, dtype=np.int32)
    left,  top    = qarr[:, 0].min(), qarr[:, 1].min()
    right, bottom = qarr[:, 0].max(), qarr[:, 1].max()
    window = frame[top:bottom, left:right]

    # 2) Binarize and extract connected components
    mono = cv2.cvtColor(window, cv2.COLOR_BGR2GRAY)
    _, mask = cv2.threshold(mono, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)
    n_tags, tagmap, props, centers = cv2.connectedComponentsWithStats(mask, connectivity=8)

    # 3) Scan the horizontal mid-line of the region
    midrow = window.shape[0] // 2
    touched = set()
    for cx in range(window.shape[1]):
        tag = tagmap[midrow, cx]
        if tag != 0:
            touched.add(tag)

    # 4) Build character boxes: width from components, height from original region
    for tag in touched:
        px, py, pw, ph, pa = props[tag]
        bounds = (px + left, top, pw, bottom - top)
        cv2.rectangle(annotated, (bounds[0], bounds[1]), (bounds[0] + bounds[2], bounds[1] + bounds[3]), (0, 255, 0), 1)

cv2.imwrite('output.png', annotated)

How it works

The detected phrase box from EasyOCR defines a stable vertical extent for the characters. Inside that region, binarization and connected component analysis reveal contiguous ink blobs. By sweeping the region’s horizontal centerline, you collect only those components that sit on the text line. The resulting per-component width defines each character’s horizontal extent, while the vertical span comes from the EasyOCR box. The rectangles drawn on a copy of the original image show the character-level boxes.

Why this matters

When a use case depends on character-level coordinates, relying solely on phrase-level boxes is not enough. Augmenting EasyOCR with connected component analysis yields a practical path to per-character geometry without modifying the OCR engine or relying on features it does not expose.

Takeaways

If you need character boxes from EasyOCR, use the phrase-level polygons as your anchor, binarize the interior, collect connected components, traverse the horizontal centerline to identify glyphs, and derive per-character rectangles. The approach above is a working baseline that you can adapt to your specific data and constraints.

The article is based on a question from StackOverflow by Ivan and an answer by André.