A puzzle generation system built on a decade of curation, 70+ reference sources, and 130 million AI inferences.
Linguabase automatically generates word association puzzles where every grouping holds together under player scrutiny. Topics within each level are strictly mutually exclusive—no word plausibly belongs to two categories in the same puzzle. Topics across levels don’t repeat. Every word is drawn from a vocabulary calibrated for both native and non-native English speakers.
As each puzzle is assembled, the system clears the wordspace around each topic so nothing overlaps. It tracks which topics and associations have been used across the full level set. And it reaches into corners of English that keep level 2,000 from feeling like a remix of level 20.
Puzzles are generated bespoke for your game’s mechanics. Order them a category or two larger than you need, and your team picks the strongest groupings — curating from abundance rather than building from scratch.
Already building levels in-house or with LLMs? Linguabase also audits existing puzzles—flagging words that may be unfamiliar to your audience, catching places where a word could fit more than one group, and suggesting expansions. Audit your most-complained-about levels first and see whether the data catches what your players are flagging.
The puzzle generation runs on a structured data foundation with three parts: words, links, and meanings. From a raw pool of over 2 million terms—including technical words, proper nouns, and noise—we’ve surfaced a curated set of 400,000 that work well for games. The vocabulary selection and familiarity ranking are purpose-built for word game development.
Roughly half are single words. The other half are words with spaces—200,000 multi-word expressions like “night sky,” “comfort food,” “hold it together,” and “old wives’ tale.” These aren’t just word combinations—they name concepts, and they carry more weight than their parts. Traditional dictionaries cover about 3% of them. Including them doubles the pool of ideas available for puzzles, and they’re the vocabulary that makes levels feel natural rather than clinical. Read more about words with spaces →
400K terms ranked by familiarity—from everyday vocabulary to crossword-worthy rarities. Includes 200K multi-word expressions. Filterable by letter count or difficulty for your game and audience.
Content filters at two severity levels: a hard-block list of offensive words, and a soft-block list of words carrying unwanted innuendo.
Over 100 million weighted relationships, including both categorical associations (soft → gentle, fuzzy, velvet) and associative relations (bunny → soft, fuzzy, rodent).
Every word broken down into senses and facets, with ~100 related words per entry on average. This is the layer that powers mutual exclusivity. Word families connect related forms (morphology): run → runs, running, runner, runway.
400K definitions as ~55-word readable paragraphs covering all senses. Short clues for gameplay—1 to 5 words, multiple angles per term.
1.5 million usage examples from literature and journalism. Metadata that makes each word usable in a game, not just present in a list.
All of this exists so that every level rewards the player’s vocabulary and lateral thinking—so that a native English speaker at level 500 still encounters categories that feel fair, interesting, and worth solving.
Approximately 100 related words per entry on average—a core set of top associations, plus sense-level pools organized by meaning:
Notice what’s here that an LLM rarely surfaces: Republican, Ganesh, Dumbo, mahout, proboscidean. Nine distinct senses of “elephant”—anatomy, behavior, circus, heraldry, ivory, megafauna, size, symbolism, wildlife. LLMs typically cluster on 2–3.
Beyond core associations, the data provides pools organized by meaning—so your game can draw from specific facets:
This is larger than the Scrabble dictionary, Merriam-Webster Collegiate, or Collins, because it includes familiar multi-word expressions other dictionaries don’t. Roughly the size of Webster’s Third Unabridged, excluding the obscure and technical words that aren’t fun.