Puzzle Data Generator

Trillions of Puzzles. Zero Repeats.

Combinatorics, Meet Content Strategy

The theoretical and Monte Carlo–sampled estimates are enormous.

Theoretical limits

The biggest factor affecting puzzle count is whether topics can be reused across puzzles. The impact is profound. To illustrate, consider a simple case: 12 possible topics, 4 per puzzle. Although there are C(12,4) = 495 ways to combine them, most combinations reuse too many topics from earlier puzzles:

All 495 combinations — 9 survive
Constraint:
👈 CHANGE THIS
1st topic
ALLOWABLE PUZZLES: 9

Modeling acceptable 4-topic puzzles, given 12 available topics.

For 12 topics

Actual puzzles draw from thousands of words spanning far more topic groups. Each topic can be represented by any of its member words — a cold-weather topic might display “chilly,” “frosty,” “icy,” or dozens of others.

Constraints on puzzle count
ConstraintDepends onTypical
VocabularyFamiliarity + letter limit6–30K
Word groupsTopic granularity4–10K
Topics / puzzleGame mechanic4–15
Connection typeAssociations > Categories

The number of topic groups is the binding constraint. A 10-topic puzzle that includes “chilly,” “adore,” and “sword” consumes pools of ~50 cold words, ~50 loving words, and ~50 weaponry words. Topics vary in size, but a 400K vocabulary contains roughly 8,000 clusters of ~50 semantically close words. For comparison, Roget’s Thesaurus subdivided the vocabulary of educated 1850s Victorians into ~1,000 categories; the Library of Congress uses 220 high-level subclasses.

Here’s the math extending the “no repeated triples” constraint shown above. With 10 topics per puzzle, any two puzzles may share up to two topics — pairs are allowed, triples are not.

A puzzle made with 10 topics has 10 × 9 × 8 ordered triples. Since order doesn’t matter — {A, B, C} is the same triple as {C, A, B} — we divide by 3! since there are 6 ways to arrange them: 10 × 9 × 8 / (3 × 2 × 1) = 120 unordered triples per puzzle.

If each exact triple can only be used once — if puzzle A contains the triple {cold, airplane, weaponry}, no other puzzle can contain that same triple — every puzzle permanently removes 120 triples from the available pool.

total triples
120
= maximum puzzles

The total triples from 8,000 topics:

8000!
3! × 7997!
=
8000 × 7999 × 7998 × 7997!
(3 × 2 × 1) × 7997!
=
8000 × 7999 × 7998 × 7997!
(3 × 2 × 1) × 7997!
= 85 billion triples

85 billion / 120 = about 711 million puzzles — from allowing just two shared topics out of ten. The same structure holds at every sharing level, and the ceiling grows by orders of magnitude each time:

Zero overlap
Sharing 1 topic
2 shared
4 shared
102 105 108 1011 1014 1017
8000 ÷ 10
= 800
8000 × 7999
10 × 9
= 711,000
8000 × 7999 × 7998
10 × 9 × 8
= 711M
8000 × … × 7996
10 × … × 6
= 1015
Log scale. Each bar cluster: puzzle sizes 3–13. Each sharing level adds one factor to numerator and denominator; the factorial divisors always cancel (shown above).

Word collisions

coffee breakfast agriculture chemistry café culture
One word, four domains

In practice, words aren’t limited to one group. “Coffee” belongs to breakfast, tropical agriculture, caffeine chemistry, and café culture. “Tropical fruit” and “things at a market” both contain mango and banana, so they can’t coexist in the same puzzle. Shared words block certain topic combinations — sometimes eliminating a few alternatives, sometimes a large swath. This brings real numbers below the theoretical ceiling.

But the key point: for typical mobile puzzle mechanics, hundreds of millions of valid topic combinations exist, each representable by ~50 different words, yielding trillions of unique-feeling puzzles.

That’s the theory. What about practice?

Empirical measurement

To get real numbers, we generated 100,000 random valid puzzles at each of 16 vocabulary settings (four word-length limits × four vocabulary depths), then ran a greedy packing algorithm to find how many could coexist under the 2-shared constraint.

100K 50K 0 8,752 puzzles (≤5 letters, 6K vocab) 18,094 puzzles (≤5 letters, 12K vocab) 28,353 puzzles (≤5 letters, 30K vocab) 69,755 puzzles (≤5 letters, full vocab) 38,695 puzzles (≤7 letters, 6K vocab) 62,227 puzzles (≤7 letters, 12K vocab) 74,661 puzzles (≤7 letters, 30K vocab) 91,849 puzzles (≤7 letters, full vocab) 51,949 puzzles (≤9 letters, 6K vocab) 74,660 puzzles (≤9 letters, 12K vocab) 85,567 puzzles (≤9 letters, 30K vocab) 94,993 puzzles (≤9 letters, full vocab) 57,797 puzzles (any length, 6K vocab) 79,885 puzzles (any length, 12K vocab) 89,752 puzzles (any length, 30K vocab) 97,127 puzzles (any length, full vocab) ≤5 ≤7 ≤9 any max word length 6K 12K 30K any vocab depth
Confirmed puzzle counts from
100K random samples, 2-shared
constraint (category mode)

Each bar shows how many of the 100,000 sampled puzzles survived — how many could coexist without any triple of topics repeating. The colors represent vocabulary depth: “6K” limits every word to the 6,000 most common — cutting the vocabulary by 98%, killing niche categories like “Renaissance painters” but keeping everyday ones like “body parts” and “weather.”

The shortest bar — five-letter common words only, the tightest possible setting — still confirms over 8,000 puzzles where no three topics repeat. Relax to seven-letter words and the count jumps to 38,000. These are lower bounds from a single sample of 100,000; the greedy algorithm is provably suboptimal and the sampling explores a fraction of the combinatorial space.

Associations vs. categories

Another factor affecting puzzle diversity is the fundamental type of connection between words.

Category mode uses child-of-parent groups: “types of cheese” → cheddar, brie, gouda. The Monte Carlo sampling above is based on this smaller, more constrained world of categories.

Association mode draws from headwords and their free associations: “ocean” → wave, salt, deep, tide. These connections are more subtle — the link only clicks once you decipher the headword — and far more abundant. Over 45,000 headwords survive even the tightest filters. At every setting, association-based puzzles are effectively unlimited.