Puzzle Data Generator

Combinatorics, Meet Content Strategy

The theoretical and Monte Carlo–sampled estimates are enormous.

Theoretical limits

The biggest factor affecting puzzle count is whether topics can be reused across puzzles. The impact is profound. To illustrate, consider a simple case: 12 possible topics, 4 per puzzle. Although there are C(12,4) = 495 ways to combine them, most combinations reuse too many topics from earlier puzzles:

All 495 combinations — 9 survive

Constraint:

👈 CHANGE THIS

1st topic

ALLOWABLE PUZZLES: 9

Modeling acceptable 4-topic puzzles, given 12 available topics.

For 12 topics

Actual puzzles draw from thousands of words spanning far more topic groups. Each topic can be represented by any of its member words — a cold-weather topic might display “chilly,” “frosty,” “icy,” or dozens of others.

Constraints on puzzle count
Constraint	Depends on	Typical
Vocabulary	Familiarity + letter limit	6–30K
Word groups	Topic granularity	4–10K
Topics / puzzle	Game mechanic	4–15
Connection type	Associations > Categories

The number of topic groups is the binding constraint. A 10-topic puzzle that includes “chilly,” “adore,” and “sword” consumes pools of ~50 cold words, ~50 loving words, and ~50 weaponry words. Topics vary in size, but a 400K vocabulary contains roughly 8,000 clusters of ~50 semantically close words. For comparison, Roget’s Thesaurus subdivided the vocabulary of educated 1850s Victorians into ~1,000 categories; the Library of Congress uses 220 high-level subclasses.

Here’s the math extending the “no repeated triples” constraint shown above. With 10 topics per puzzle, any two puzzles may share up to two topics — pairs are allowed, triples are not.

A puzzle made with 10 topics has 10 × 9 × 8 ordered triples. Since order doesn’t matter — {A, B, C} is the same triple as {C, A, B} — we divide by 3! since there are 6 ways to arrange them: 10 × 9 × 8 / (3 × 2 × 1) = 120 unordered triples per puzzle.

If each exact triple can only be used once — if puzzle A contains the triple {cold, airplane, weaponry}, no other puzzle can contain that same triple — every puzzle permanently removes 120 triples from the available pool.

total triples
120
=maximum puzzles

The total triples from 8,000 topics:

8000!
3! × 7997!
=8000 × 7999 × 7998 × 7997!
(3 × 2 × 1) × 7997!
=8000 × 7999 × 7998 × 7997!
(3 × 2 × 1) × 7997!
=85 billion triples

85 billion / 120 = about 711 million puzzles — from allowing just two shared topics out of ten. The same structure holds at every sharing level, and the ceiling grows by orders of magnitude each time:

Zero overlap

Sharing 1 topic

2 shared

4 shared

8000 ÷ 10

= 800

8000 × 7999

10 × 9

= 711,000

8000 × 7999 × 7998

10 × 9 × 8

= 711M

8000 × … × 7996

10 × … × 6

= 10¹⁵

Log scale. Each bar cluster: puzzle sizes 3–13. Each sharing level adds one factor to numerator and denominator; the factorial divisors always cancel (shown above).

Word collisions

One word, four domains

In practice, words aren’t limited to one group. “Coffee” belongs to breakfast, tropical agriculture, caffeine chemistry, and café culture. “Tropical fruit” and “things at a market” both contain mango and banana, so they can’t coexist in the same puzzle. Shared words block certain topic combinations — sometimes eliminating a few alternatives, sometimes a large swath. This brings real numbers below the theoretical ceiling.

But the key point: for typical mobile puzzle mechanics, hundreds of millions of valid topic combinations exist, each representable by ~50 different words, yielding trillions of unique-feeling puzzles.

That’s the theory. What about practice?

Empirical measurement

To get real numbers, we generated 100,000 random valid puzzles at each of 16 vocabulary settings (four word-length limits × four vocabulary depths), then ran a greedy packing algorithm to find how many could coexist under the 2-shared constraint.

Confirmed puzzle counts from
100K random samples, 2-shared
constraint (category mode)

Each bar shows how many of the 100,000 sampled puzzles survived — how many could coexist without any triple of topics repeating. The colors represent vocabulary depth: “6K” limits every word to the 6,000 most common — cutting the vocabulary by 98%, killing niche categories like “Renaissance painters” but keeping everyday ones like “body parts” and “weather.”

The shortest bar — five-letter common words only, the tightest possible setting — still confirms over 8,000 puzzles where no three topics repeat. Relax to seven-letter words and the count jumps to 38,000. These are lower bounds from a single sample of 100,000; the greedy algorithm is provably suboptimal and the sampling explores a fraction of the combinatorial space.

Associations vs. categories

Another factor affecting puzzle diversity is the fundamental type of connection between words.

Category mode uses child-of-parent groups: “types of cheese” → cheddar, brie, gouda. The Monte Carlo sampling above is based on this smaller, more constrained world of categories.

Association mode draws from headwords and their free associations: “ocean” → wave, salt, deep, tide. These connections are more subtle — the link only clicks once you decipher the headword — and far more abundant. Over 45,000 headwords survive even the tightest filters. At every setting, association-based puzzles are effectively unlimited.