2025, Nov 05 17:00
Reliable Stickerless Rubik's Cube Face Detection with YOLOv11 Segmentation and OpenCV Checks
Learn how to detect stickerless Rubik's cube faces reliably: use YOLOv11 segmentation, then validate with OpenCV shape and HSV color checks for robust results
Detecting stickerless Rubik’s cube faces reliably with classical image processing is harder than it looks. Edges are weak or missing between adjacent same-colored cubies, and simple color heuristics are easily confused by backgrounds. If you’ve tried Canny-based contour pipelines and hit a wall, this guide shows a practical path that works: move from edges to semantic segmentation and add lightweight geometric and color checks.
Problem setup: why the classic pipeline breaks
Edge-based approaches thrive on high-contrast sticker borders. On stickerless cubes, neighboring tiles can share the same hue, so boundaries are faint or not present at all. Backgrounds often intrude with similar saturation and brightness, producing either merged contours or pure misses. Even seemingly straightforward tricks fall apart: white tiles don’t carry saturation, and a wooden floor’s saturation can rival the cube’s, so simple H/S-based masking cannot cleanly isolate the face.
Baseline attempt that struggles in practice
The following OpenCV pipeline denoises, blurs, extracts edges, dilates, and attempts to find square-like contours. This kind of code works for stickered faces on dark backgrounds but is unreliable for stickerless faces.
import cv2 as cv
import numpy as np
from google.colab.patches import cv2_imshow
img_src = cv.imread('cube.png')
gray_img = cv.cvtColor(img_src, cv.COLOR_BGR2GRAY)
denoised_img = cv.fastNlMeansDenoising(gray_img, None, 20, 7, 7)
smeared_img = cv.blur(denoised_img, (3, 3))
edges_img = cv.Canny(smeared_img, 30, 60, 3)
thick_edges = cv.dilate(edges_img, cv.getStructuringElement(cv.MORPH_RECT, (9, 9)))
cnts, _ = cv.findContours(thick_edges, cv.RETR_TREE, cv.CHAIN_APPROX_SIMPLE)
boxes = []
for c in cnts:
approx_poly = cv.approxPolyDP(c, 0.1 * cv.arcLength(c, True), True)
if len(approx_poly) == 4 or True:
x, y, w, h = cv.boundingRect(approx_poly)
aspect = float(w) / h
area = cv.contourArea(approx_poly)
if aspect >= 0.8 and aspect <= 1.2 and w >= 30 and w <= 80 and area >= 900:
boxes.append({"x": x, "y": y, "w": w, "h": h})
vis = img_src.copy()
for b in boxes:
x, y, w, h = b["x"], b["y"], b["w"], b["h"]
cv.rectangle(vis, (x, y), (x + w, y + h), (0, 255, 0), 2)
cv2.imshow(vis)
In many real scenes this fails to separate tiles, and adjacent equal-colored cubies often get merged into a single contour.
Root causes
The core issue is that the decision boundary is not in edges but in semantics. You’re trying to identify a specific object and its face layout under varied lighting and backgrounds. Edge gradients, grayscale thresholds, or single-channel heuristics are not stable signals for that task. As observed in practice, white tiles carry no saturation and the background can exhibit competing saturation and luminance, so a clean segmentation based purely on color channels is not feasible. The approach needs to interpret shapes and regions as a whole.
throw AI at it. it’s good at that. not chatbots but semantic segmentation models.
Working solution: YOLOv11 segmentation with light post-processing
Train a YOLOv11 segmentation model on cube faces, then run simple checks to validate shape and color consistency. This approach avoids brittle edges and leverages learned masks. Prepare your dataset in the YOLOv11 Instance Segmentation format and create a data.yaml:
train: ../train/images
val: ../valid/images
test: ../test/images
nc: 6
names: ['Cube']
Install ultralytics and start training:
!pip install ultralytics
from ultralytics import YOLO
seg_net = YOLO('best.pt')
seg_net.train(data='./data/data.yaml', epochs=100, batch=64, device='cuda')
After inference, validate the candidate region. The checks below ensure the region is approximately square, mostly filled, and that each tile is relatively homogeneous in a target color range. The color homogeneity test works in HSV using predefined color_ranges.
import cv2
import numpy as np
from ultralytics import YOLO
def looks_square(tile, tol=0.2):
h, w = tile.shape[:2]
r1, r2 = h / w, w / h
if r1 < 1 - tol or r1 > 1 + tol:
return False
if r2 < 1 - tol or r2 > 1 + tol:
return False
return True
def mostly_filled(tile, fill_thresh=0.85):
h, w, c = tile.shape
total = h * w * c
filled = np.sum(tile > 0)
return filled / total > fill_thresh
def passes_color_uniformity(tile, color, min_ratio):
if color not in color_ranges:
return False
hh, ww = tile.shape[:2]
hsv = cv2.cvtColor(tile, cv2.COLOR_BGR2HSV)
lower, upper = color_ranges[color]
mask = cv2.inRange(hsv, np.array(lower), np.array(upper))
return (np.count_nonzero(mask) / (hh * ww)) > min_ratio
def run_seg_inference(net: YOLO, frame):
return net(frame, verbose=False)
def extract_face_grid(outputs, n, homogenity_thres=0.6):
for _, pred in enumerate(outputs):
orig = pred.orig_img
H, W, _ = orig.shape
if pred.masks is not None:
for idx, mk in enumerate(pred.masks.data):
mask_np = (mk.cpu().numpy() * 255).astype(np.uint8)
if mask_np.shape[0] != orig.shape[0] or mask_np.shape[1] != orig.shape[1]:
mask_np = cv2.resize(mask_np, (W, H), interpolation=cv2.INTER_NEAREST)
mask_np, rect = simplify_mask(mask_np, eps=0.005)
masked = cv2.bitwise_and(orig, orig, mask=mask_np)
x1, y1, ww, hh = rect
x2, y2 = x1 + ww, y1 + hh
x1 = max(0, x1)
y1 = max(0, y1)
x2 = min(orig.shape[1], x2)
y2 = min(orig.shape[0], y2)
crop = masked[y1:y2, x1:x2]
if not looks_square(crop):
continue
if not mostly_filled(crop):
continue
tags, uniform = infer_colors_grid(crop, n, color_detection_model)
if sum([sum(row) for row in uniform]) < homogenity_thres * len(uniform) * len(uniform[0]):
continue
return tags, crop, mask_np, rect
return None, None, None, None
def infer_colors_grid(patch, n, color_detection_model):
h, w, _ = patch.shape
hh, ww = h // n, w // n
tag_grid = [['' for _ in range(n)] for __ in range(n)]
uniform_grid = [[False for _ in range(n)] for __ in range(n)]
for i in range(n):
for j in range(n):
cell = patch[i * hh:(i + 1) * hh, j * ww:(j + 1) * ww]
tag_grid[i][j] = find_best_matching_color_legacy(
get_median_color(cell), tpe='bgr')
uniform_grid[i][j] = passes_color_uniformity(cell, tag_grid[i][j], 0.5)
return tag_grid, uniform_grid
Run it on a frame and retrieve the face grid and the corresponding crop and mask:
results_out = run_seg_inference(seg_net, current_frame)
face_grid, face_crop, face_mask, face_rect = extract_face_grid(results_out, n=grid_n, homogenity_thres=0.6)
This setup uses the segmentation model to produce a clean object mask, then applies tight checks that match the geometry and color layout of a cube face. Where edge detection merged tiles or missed faces entirely, the learned segmentation handles the heavy lifting. Thanks to this, the downstream logic reduces to straightforward validation and grid sampling.
Why this matters
Stickerless faces don’t guarantee strong edges or simple thresholds. A segmentation model trained on your scenes is robust to background variability and lighting, while remaining efficient to run. The added shape and color homogeneity checks keep the pipeline deterministic and interpretable, so you can trust what gets passed on to your color-reading logic.
Takeaways
If edge-based contours fail on stickerless faces, stop fighting the image. Use a semantic segmentation model to isolate the cube reliably, then validate the detection with simple geometric constraints and per-tile homogeneity checks. Sample colors per grid cell using a median and map them with your existing color matcher. This combination is resilient in the presence of weak boundaries, similar backgrounds, and varying capture conditions, and it keeps the final code clear and maintainable.
The article is based on a question from StackOverflow by Tripaloski and an answer by Tripaloski.