2025, Oct 15 19:00
Faster NYT Spelling Bee Scoring: Precompute and Aggregate Unique Bitmasks to Find the Optimal Seven-Letter Set
Optimize NYT Spelling Bee scoring by aggregating words into unique bitmasks, precomputing scores, and enumerating all 7-letter sets for fast results.
Maximizing the NYT Spelling Bee score sounds straightforward: pick seven letters, count all valid words that use only those letters, weight them by the game’s scoring rules, and find the best set. The trap is in how you traverse and score the search space. A naive search across masks combined with repeatedly rescoring the full word list will grind to a halt long before you see an optimal answer.
Baseline approach and where it hurts
The baseline below builds a language filtered to words with at most seven distinct letters, represents words as 26-bit masks, and explores letter sets as masks in a uniform-cost search. The scoring function scans all words for each mask. The logic is correct, but the performance collapses because the scoring repeats massive amounts of identical work.
from string import ascii_lowercase
from heapq import heappush, heappop
ALPH = ascii_lowercase  # Σ
with open("words.txt") as fh:
    raw_words = fh.read().splitlines()
# Keep words that use no more than 7 distinct letters
pool = list(filter(lambda w: len(set(w)) <= 7, raw_words))
# Turn each word into a 26-bit mask
word_masks = [0] * len(pool)
for idx in range(len(pool)):
    for ch in pool[idx]:
        word_masks[idx] |= 1 << (ord(ch) - ord('a'))
# Game scoring: 4-letter words are 1 point, longer words score length
def score_of(w):
    return 1 if len(w) == 4 else len(w)
# Compute score for a given letter-set mask by scanning all words
def tally_for(msk):
    total = 0
    for i in range(len(pool)):
        if (msk | word_masks[i]) == msk:
            total += score_of(pool[i])
    return -total  # negative for min-heap usage
# Dijkstra-like exploration over masks, removing one letter per step
root_mask = (1 << len(ALPH)) - 1
root_entry = (tally_for(root_mask), root_mask)
frontier = [root_entry]
seen = set()
while True:
    val, msk = heappop(frontier)
    print(val, "".join(ALPH[i] if msk & (1 << i) else ' ' for i in range(26)))
    if msk.bit_count() == 7:
        break
    for i in range(len(ALPH)):
        if msk & (1 << i):
            nxt = msk & ~(1 << i)
            if nxt in seen:
                continue
            heappush(frontier, (tally_for(nxt), nxt))
            seen.add(nxt)
The core bottleneck is the per-mask scoring. Each call to the scoring function walks the entire word list and recomputes per-word scores, even though those scores never change. The search itself also runs into large plateaus where many masks have the same score, driving up the number of states explored.
The insight: aggregate by unique bitmask
Two observations unlock the performance. First, a word’s score is mask-independent, so compute it once and reuse it. Second, many words share the same letter-set bitmask. If you compress the word list into unique bitmasks and sum the scores of words per bitmask, the scoring loop over a letter-set mask can run over those unique masks instead of all words.
Consider abalones and seasonable. Both map to the same bitmask for {a, b, e, l, n, o, s}. Instead of scoring these words separately on every pass, fold them into a single entry keyed by that bitmask with the combined score. Using a common word list (wamerican), this reduces the data from 40940 words to 20818 unique bitmasks. That halving alone cuts the scoring work dramatically, and it also makes a simple full enumeration of all 26 choose 7 letter sets feasible.
With this aggregation, a Python implementation can evaluate the scoring function 1000 times in about 1.4 seconds on the full 26-letter mask on one machine, which is fast enough to brute-force the best 7-letter set.
Refactored solution: precompute and compress
The code below builds a mapping from bitmask to the total score of all words that share that mask. The scoring function then iterates over that mapping. The program logic is unchanged; only the data shape and scoring loop are optimized.
from string import ascii_lowercase
import timeit
ALPH = ascii_lowercase
with open("words.txt") as fh:
    raw = fh.read().splitlines()
# Only keep words that use at most 7 unique letters
filtered = list(filter(lambda w: len(set(w)) <= 7, raw))
# Convert each word to a 26-bit mask
bm_list = [0] * len(filtered)
for i in range(len(filtered)):
    for ch in filtered[i]:
        bm_list[i] |= 1 << (ord(ch) - ord('a'))
# Game scoring: 4-letter words get 1, otherwise use length
def points(w):
    return 1 if len(w) == 4 else len(w)
# Aggregate: bitmask -> sum of scores of all words matching that bitmask
bm_to_score = {}
for i in range(len(filtered)):
    bm_to_score.setdefault(bm_list[i], 0)
    bm_to_score[bm_list[i]] += points(filtered[i])
print(f"bm_count={len(bm_to_score)}, words_filtered={len(filtered)}, words_total={len(raw)}")
# Score a candidate letter-set mask by summing over compatible bitmasks
def total_for(msk):
    s = 0
    for bm, acc in bm_to_score.items():
        if (msk | bm) == msk:
            s += acc
    return -s
# Example timing: score the full alphabet mask many times
print(timeit.timeit("total_for(0b11111111111111111111111111)", number=1000, globals=globals()))
This change alone makes it practical to iterate through all 657800 seven-letter masks and pick the maximum-scoring one. The same structure also accommodates later tweaks for the center letter requirement and pangram bonus; those adjustments can be layered on top of the preaggregation without changing the overall approach.
Why this matters
Scoring per word per mask scales linearly with the number of words, which is disastrous when done repeatedly during search. Collapsing words into unique bitmasks shrinks the data, eliminates redundant scoring, and transforms the problem into fast bitwise checks over an order-of-magnitude smaller set. In practice, this makes direct enumeration of all seven-letter combinations viable, without resorting to path-dependent search strategies that wander through vast score plateaus.
An additional practical note from C++ experiments: a flat int array of size 2^26 for the bitmask-to-score table can be much faster than an unordered_set, even though the array is larger. Maybe the hash table has poor cache performance while the array has very good cache performance, or hashing itself just takes time. This wasn’t profiled, but it aligns with the idea that predictable contiguous memory can win despite higher nominal size.
Takeaways
Precompute everything that doesn’t depend on the candidate mask and remove redundancy by aggregating identical bitmasks. With that in place, scoring a mask becomes a tight loop over unique masks, and evaluating all 26 choose 7 letter sets is straightforward. If needed, incorporate the center letter rule and pangram bonus into the same framework. The result is an efficient, data-driven pipeline that avoids expensive graph search and surfaces the optimal letter set.
Conclusion
If an algorithm spends most of its time recalculating the same facts, step back and change the data. For Spelling Bee scoring, the right data shape is “bitmask -> total score”, built once and reused everywhere. That turns an intractable search over masks into a simple, fast pass across compressed structures—exactly what you need to find the optimal seven-letter set.
The article is based on a question from StackOverflow by qwr and an answer by SquareFinder.