2025, Dec 04 05:00

Robust edge cleanup for binarized music and book scans in OpenCV with an exterior mask and pixel-by-pixel border trimming

Clean messy borders in binarized music or book scans using OpenCV. Learn a flood-fill exterior mask and iterative, pixelwise cropping to remove artifacts safely

Scanned music and book pages often arrive already binarized, but the edges are messy: streaks, speckles, and half-connected marks hugging the border. A simple flood fill from a corner after adding a solid frame helps, yet residual artifacts and occasional over-trimming still happen. Below is a practical way to make that edge cleanup deterministic and robust with OpenCV, while keeping musical notation and text intact.

Problem overview

The initial pipeline is straightforward: load a page in grayscale, pad it with a black frame, flood fill from the top-left corner with white, then crop to the tightest bounding box of black pixels. That removes black regions linked to the borders, but stubborn edge noise can remain and, in some cases, content near the corner gets eaten away.

Minimal example that shows the issue

The snippet below captures that basic approach. It thresholds implicitly by working on a binarized scan, pads the image, flood-fills the outer region to white, then computes the bounding box of the remaining black content. Some pages still keep speckles and ring-shaped marks along edges; others lose small content near the corners.

import os
import cv2
import numpy as np
base_dir = os.path.abspath(os.path.dirname(__file__))
page_names = ['1.webp', '2.webp', '3.webp', '4.webp']
for page_name in page_names:
    src_path = os.path.join(base_dir, page_name)
    dst_path = os.path.join(base_dir, page_name + '-clean.webp')
    page = cv2.imread(src_path, cv2.IMREAD_GRAYSCALE)
    page = cv2.copyMakeBorder(page, 50, 50, 50, 50, cv2.BORDER_CONSTANT)
    cv2.floodFill(page, None, (0, 0), 255)
    ys, xs = np.nonzero(page == 0)
    page = page[np.min(ys):np.max(ys), np.min(xs):np.max(xs)]
    cv2.imwrite(dst_path, page)

What actually goes wrong and why

Artifacts hugging the border can survive because a single all-sides trim based on the current silhouette is too blunt. If black debris exists on multiple sides, the bounding box includes them all at once and keeps the unwanted margins. Conversely, when fine content or faint strokes touch the border, flood fill can turn that region into “background,” and the later bounding-box crop removes actual content. The result is inconsistent: some images retain border junk, some lose meaningful strokes.

A more reliable approach with OpenCV

A robust strategy is to explicitly model the exterior region and then iteratively crop one pixel at a time from the darkest border side until all four sides are clean. The flow is simple in spirit. Convert to three channels if needed, add a 1-pixel black frame to guarantee closure, flood fill from the top-left with a marker color (red), isolate that red exterior via inRange, invert to get a mask where interior content is white and exterior is black, then repeatedly measure the 1-pixel-wide strips on all four sides. At each step, remove exactly one pixel from the side whose border strip is darkest. Recompute and repeat until all sides are white. Finally, crop the original image using the discovered coordinates. This mirrors an ImageMagick-like behavior but keeps everything in Python/OpenCV.

import cv2
import numpy as np
# img = cv2.imread('page1.webp')
# img = cv2.imread('page2.webp')
src = cv2.imread('page3.webp')
h, w, ch = src.shape
if ch != 3:
    src = cv2.merge([src, src, src])
padded = cv2.copyMakeBorder(src, 1, 1, 1, 1, borderType=cv2.BORDER_CONSTANT, value=(0, 0, 0))
ff = padded.copy()
red_color = (0, 0, 255)
lo_tol = (0, 0, 0)
hi_tol = (0, 0, 0)
cv2.floodFill(ff, None, (0, 0), red_color, lo_tol, hi_tol, flags=8)
lower = (0, 0, 255)
upper = (0, 0, 255)
edge_mask = cv2.inRange(ff, lower, upper)
edge_mask = 255 - edge_mask
print(edge_mask[0:1, 0:1])
t = 0
l = 0
b = h
r = w
step = 0
mt = np.mean(edge_mask[t:t+1, l:r])
ml = np.mean(edge_mask[t:b, l:l+1])
mb = np.mean(edge_mask[b-1:b, l:r])
mr = np.mean(edge_mask[t:b, r-1:r])
mmin = min(mt, ml, mb, mr)
print("mean_top=", mt, " mean_left=", ml, " mean_bottom=", mb, " mean_right=", mr, " mean_minimum=", mmin)
t_state = "stop" if (mt == 255) else "go"
l_state = "stop" if (ml == 255) else "go"
b_state = "stop" if (mb == 255) else "go"
r_state = "stop" if (mr == 255) else "go"
print(t_state, l_state, b_state, r_state)
while t_state == "go" or l_state == "go" or r_state == "go" or b_state == "go":
    if t_state == "go":
        if mt != 255:
            if mt == mmin:
                t += 1
                mt = np.mean(edge_mask[t:t+1, l:r])
                ml = np.mean(edge_mask[t:b, l:l+1])
                mb = np.mean(edge_mask[b-1:b, l:r])
                mr = np.mean(edge_mask[t:b, r-1:r])
                mmin = min(mt, ml, mr, mb)
                step += 1
                print("increment=", step, "top_count=", t, " top_mean=", mt)
                continue
        else:
            t_state = "stop"
            print("top stop")
    if l_state == "go":
        if ml != 255:
            if ml == mmin:
                l += 1
                mt = np.mean(edge_mask[t:t+1, l:r])
                ml = np.mean(edge_mask[t:b, l:l+1])
                mb = np.mean(edge_mask[b-1:b, l:r])
                mr = np.mean(edge_mask[t:b, r-1:r])
                mmin = min(mt, ml, mr, mb)
                step += 1
                print("increment=", step, "left_count=", l, " left_mean=", ml)
                continue
        else:
            l_state = "stop"
            print("left stop")
    if b_state == "go":
        if mb != 255:
            if mb == mmin:
                b -= 1
                mt = np.mean(edge_mask[t:t+1, l:r])
                ml = np.mean(edge_mask[t:b, l:l+1])
                mb = np.mean(edge_mask[b-1:b, l:r])
                mr = np.mean(edge_mask[t:b, r-1:r])
                mmin = min(mt, ml, mr, mb)
                step += 1
                print("increment=", step, "bottom_count=", b, " bottom_mean=", mb)
                continue
        else:
            b_state = "stop"
            print("bottom stop")
    if r_state == "go":
        if mr != 255:
            if mr == mmin:
                r -= 1
                mt = np.mean(edge_mask[t:t+1, l:r])
                ml = np.mean(edge_mask[t:b, l:l+1])
                mb = np.mean(edge_mask[b-1:b, l:r])
                mr = np.mean(edge_mask[t:b, r-1:r])
                mmin = min(mt, ml, mr, mb)
                step += 1
                print("increment=", step, "right_count=", r, " right_mean=", mr)
                continue
        else:
            r_state = "stop"
            print("right stop")
cropped = src[t:b, l:r]
print("top: ", t)
print("bottom: ", b)
print("left: ", l)
print("right: ", r)
print("height:", cropped.shape[0])
print("width:", cropped.shape[1])
# cv2.imwrite('page1_cropped.png', cropped)
# cv2.imwrite('page2_cropped.png', cropped)
cv2.imwrite('page3_cropped.png', cropped)
cv2.imshow("mask", edge_mask)
cv2.imshow("cropped", cropped)
cv2.waitKey(0)
cv2.destroyAllWindows()

How the fix addresses the edge cases

The key is to treat the exterior explicitly, then decide—pixel by pixel—which border to shave off based on measured darkness. The flood-filled red region marks the outside. After converting it into a binary mask and inverting, the border strips are evaluated numerically. By always removing exactly one pixel from the darkest side and re-evaluating, the crop adapts to uneven debris around the page. Iteration stops only when all four sides are fully white in the mask, meaning the interior content is isolated.

Why this matters

At scale, deterministic rules keep results consistent. When processing millions of pages, even a small rate of over-trimming or under-trimming compounds. A measurable per-border decision based on the mask reduces both lingering speckles and accidental loss of content at the edges, while remaining automatic and reproducible across diverse pages. Other avenues exist, such as row/column pixel-count heuristics or semantic segmentation, but the approach above is purely algorithmic and operates directly on binarized scans using standard OpenCV primitives.

Practical wrap‑up

If your scans are already thresholded and you see mixed results from a single-pass flood fill plus bounding box, switch to an exterior mask and iterate. Pad with a minimal black frame, flood fill from a corner with a marker color, derive a binary mask of the exterior, invert to get interior, then progressively crop from the darkest side until all borders are clean. This small change turns an all-at-once trim into a controlled, data-driven process that better preserves notation and text while removing stubborn edge artifacts.