2025, Nov 05 17:00

Reliable Stickerless Rubik's Cube Face Detection with YOLOv11 Segmentation and OpenCV Checks

Learn how to detect stickerless Rubik's cube faces reliably: use YOLOv11 segmentation, then validate with OpenCV shape and HSV color checks for robust results

Detecting stickerless Rubik’s cube faces reliably with classical image processing is harder than it looks. Edges are weak or missing between adjacent same-colored cubies, and simple color heuristics are easily confused by backgrounds. If you’ve tried Canny-based contour pipelines and hit a wall, this guide shows a practical path that works: move from edges to semantic segmentation and add lightweight geometric and color checks.

Problem setup: why the classic pipeline breaks

Edge-based approaches thrive on high-contrast sticker borders. On stickerless cubes, neighboring tiles can share the same hue, so boundaries are faint or not present at all. Backgrounds often intrude with similar saturation and brightness, producing either merged contours or pure misses. Even seemingly straightforward tricks fall apart: white tiles don’t carry saturation, and a wooden floor’s saturation can rival the cube’s, so simple H/S-based masking cannot cleanly isolate the face.

Baseline attempt that struggles in practice

The following OpenCV pipeline denoises, blurs, extracts edges, dilates, and attempts to find square-like contours. This kind of code works for stickered faces on dark backgrounds but is unreliable for stickerless faces.

import cv2 as cv
import numpy as np
from google.colab.patches import cv2_imshow

img_src = cv.imread('cube.png')

gray_img = cv.cvtColor(img_src, cv.COLOR_BGR2GRAY)
denoised_img = cv.fastNlMeansDenoising(gray_img, None, 20, 7, 7)
smeared_img = cv.blur(denoised_img, (3, 3))
edges_img = cv.Canny(smeared_img, 30, 60, 3)
thick_edges = cv.dilate(edges_img, cv.getStructuringElement(cv.MORPH_RECT, (9, 9)))

cnts, _ = cv.findContours(thick_edges, cv.RETR_TREE, cv.CHAIN_APPROX_SIMPLE)

boxes = []

for c in cnts:
    approx_poly = cv.approxPolyDP(c, 0.1 * cv.arcLength(c, True), True)
    if len(approx_poly) == 4 or True:
        x, y, w, h = cv.boundingRect(approx_poly)
        aspect = float(w) / h
        area = cv.contourArea(approx_poly)
        if aspect >= 0.8 and aspect <= 1.2 and w >= 30 and w <= 80 and area >= 900:
            boxes.append({"x": x, "y": y, "w": w, "h": h})

vis = img_src.copy()
for b in boxes:
    x, y, w, h = b["x"], b["y"], b["w"], b["h"]
    cv.rectangle(vis, (x, y), (x + w, y + h), (0, 255, 0), 2)

cv2.imshow(vis)

In many real scenes this fails to separate tiles, and adjacent equal-colored cubies often get merged into a single contour.

Root causes

The core issue is that the decision boundary is not in edges but in semantics. You’re trying to identify a specific object and its face layout under varied lighting and backgrounds. Edge gradients, grayscale thresholds, or single-channel heuristics are not stable signals for that task. As observed in practice, white tiles carry no saturation and the background can exhibit competing saturation and luminance, so a clean segmentation based purely on color channels is not feasible. The approach needs to interpret shapes and regions as a whole.

throw AI at it. it’s good at that. not chatbots but semantic segmentation models.

Working solution: YOLOv11 segmentation with light post-processing

Train a YOLOv11 segmentation model on cube faces, then run simple checks to validate shape and color consistency. This approach avoids brittle edges and leverages learned masks. Prepare your dataset in the YOLOv11 Instance Segmentation format and create a data.yaml:

train: ../train/images
val: ../valid/images
test: ../test/images

nc: 6
names: ['Cube']

Install ultralytics and start training:

!pip install ultralytics
from ultralytics import YOLO

seg_net = YOLO('best.pt')

seg_net.train(data='./data/data.yaml', epochs=100, batch=64, device='cuda')

After inference, validate the candidate region. The checks below ensure the region is approximately square, mostly filled, and that each tile is relatively homogeneous in a target color range. The color homogeneity test works in HSV using predefined color_ranges.

import cv2
import numpy as np
from ultralytics import YOLO


def looks_square(tile, tol=0.2):
    h, w = tile.shape[:2]
    r1, r2 = h / w, w / h
    if r1 < 1 - tol or r1 > 1 + tol:
        return False
    if r2 < 1 - tol or r2 > 1 + tol:
        return False
    return True


def mostly_filled(tile, fill_thresh=0.85):
    h, w, c = tile.shape
    total = h * w * c
    filled = np.sum(tile > 0)
    return filled / total > fill_thresh


def passes_color_uniformity(tile, color, min_ratio):
    if color not in color_ranges:
        return False
    hh, ww = tile.shape[:2]
    hsv = cv2.cvtColor(tile, cv2.COLOR_BGR2HSV)
    lower, upper = color_ranges[color]
    mask = cv2.inRange(hsv, np.array(lower), np.array(upper))
    return (np.count_nonzero(mask) / (hh * ww)) > min_ratio


def run_seg_inference(net: YOLO, frame):
    return net(frame, verbose=False)


def extract_face_grid(outputs, n, homogenity_thres=0.6):
    for _, pred in enumerate(outputs):
        orig = pred.orig_img
        H, W, _ = orig.shape
        if pred.masks is not None:
            for idx, mk in enumerate(pred.masks.data):
                mask_np = (mk.cpu().numpy() * 255).astype(np.uint8)
                if mask_np.shape[0] != orig.shape[0] or mask_np.shape[1] != orig.shape[1]:
                    mask_np = cv2.resize(mask_np, (W, H), interpolation=cv2.INTER_NEAREST)
                mask_np, rect = simplify_mask(mask_np, eps=0.005)
                masked = cv2.bitwise_and(orig, orig, mask=mask_np)
                x1, y1, ww, hh = rect
                x2, y2 = x1 + ww, y1 + hh
                x1 = max(0, x1)
                y1 = max(0, y1)
                x2 = min(orig.shape[1], x2)
                y2 = min(orig.shape[0], y2)
                crop = masked[y1:y2, x1:x2]
                if not looks_square(crop):
                    continue
                if not mostly_filled(crop):
                    continue
                tags, uniform = infer_colors_grid(crop, n, color_detection_model)
                if sum([sum(row) for row in uniform]) < homogenity_thres * len(uniform) * len(uniform[0]):
                    continue
                return tags, crop, mask_np, rect
    return None, None, None, None


def infer_colors_grid(patch, n, color_detection_model):
    h, w, _ = patch.shape
    hh, ww = h // n, w // n
    tag_grid = [['' for _ in range(n)] for __ in range(n)]
    uniform_grid = [[False for _ in range(n)] for __ in range(n)]
    for i in range(n):
        for j in range(n):
            cell = patch[i * hh:(i + 1) * hh, j * ww:(j + 1) * ww]
            tag_grid[i][j] = find_best_matching_color_legacy(
                get_median_color(cell), tpe='bgr')
            uniform_grid[i][j] = passes_color_uniformity(cell, tag_grid[i][j], 0.5)
    return tag_grid, uniform_grid

Run it on a frame and retrieve the face grid and the corresponding crop and mask:

results_out = run_seg_inference(seg_net, current_frame)

face_grid, face_crop, face_mask, face_rect = extract_face_grid(results_out, n=grid_n, homogenity_thres=0.6)

This setup uses the segmentation model to produce a clean object mask, then applies tight checks that match the geometry and color layout of a cube face. Where edge detection merged tiles or missed faces entirely, the learned segmentation handles the heavy lifting. Thanks to this, the downstream logic reduces to straightforward validation and grid sampling.

Why this matters

Stickerless faces don’t guarantee strong edges or simple thresholds. A segmentation model trained on your scenes is robust to background variability and lighting, while remaining efficient to run. The added shape and color homogeneity checks keep the pipeline deterministic and interpretable, so you can trust what gets passed on to your color-reading logic.

Takeaways

If edge-based contours fail on stickerless faces, stop fighting the image. Use a semantic segmentation model to isolate the cube reliably, then validate the detection with simple geometric constraints and per-tile homogeneity checks. Sample colors per grid cell using a median and map them with your existing color matcher. This combination is resilient in the presence of weak boundaries, similar backgrounds, and varying capture conditions, and it keeps the final code clear and maintainable.

The article is based on a question from StackOverflow by Tripaloski and an answer by Tripaloski.

computer-vision object-detection opencv python