What LLMs Miss

LLMs sample from training frequency, so they’ll give you “key → unlock” a thousand times before “key → reef.”

We asked a frontier LLM for word associations and compared the results to Linguabase. The pattern is consistent: LLMs give you the statistically dominant meanings and miss the rest.

Word Linguabase only Both LLM only
key reef, atoll, cay, clef, transpose, tumbler, cryptography, fob, keystone unlock, cipher, door, password, piano, code, solution lock, keyboard, essential, metal, chain, scale, crucial
bridge bidding, trump, slam, contract, luthier, nose, Wheatstone, cantilever, viaduct crossing, span, arch, suspension, river, dental, overpass connect, gap, cable, pier, structure, link
window browser, tab, popup, opportunity, mullion, oculus, clerestory, fenestration sill, pane, casement, screen, curtain, frame, view, glass shade, ledge, transparent, light, ventilation, drapes
nephew nepotism, avuncular, nibling, godson, godfather, doting, prodigal, namesake niece, uncle, aunt, cousin, brother, sister, relative, kin relation, generation, son, kinship, bond
penguin Linux, Happy Feet, Morgan Freeman, rookery, porpoising, countershading, crèche cold, flightless, krill, waddle, emperor, tuxedo, Antarctic ice, swim, arctic, ocean, pebble
tornado Joplin, Moore, Wizard of Oz, storm chaser, mesocyclone, mobile home, alley twister, funnel, supercell, cyclone, vortex, storm, Dorothy rotating, destruction, severe, Midwest, siren
giraffe ossicones, blood pressure, Geoffrey, okapi, reticulated, Serengeti, ruminant acacia, spots, neck, tall, tongue, savanna, Africa, calf legs, pattern, horns, graceful

LLM tested: Claude 4.5 Opus, January 2026. Prompt: “list words related to [word], comma delimited.”

Which do you prefer? The leftmost column (unique to Linguabase), or the rightmost (created by an LLM)?

As game designers and wordplay lovers, even the best LLM output feels flat. And over time, it starts to feel repetitious — an insidious sameness that makes level 49 feel eerily like levels 36 and 85.

What’s Missing from the LLM Responses?

Good word games enchant and retain players when they include that kind of richness: non-obvious facets, technical depth, cultural touchstones, etymological connections, sensory and experiential associations. And they feel better when there’s less generic filler.

The Linguabase column (left) is more interesting because it embodies more dimensions.

Why This Happens

LLMs are trained on text where some meanings appear far more often than others. Perhaps in the training data, “key” in security contexts outnumbers “key” as a low island — leaving us with what’s statistically dominant: unlock, lock, door, password, access.

This isn’t a prompting problem. You can ask for “diverse” or “unusual” associations and the LLM will try — but it’s still sampling from the same skewed distribution.

Validators, Not Generators

You can use an LLM to check whether “key → reef” is a valid association. It’ll say yes. But it won’t generate that association reliably, because reef isn’t in the high-probability zone when the prompt is “what’s related to key?”

That’s the core insight: LLMs are good validators, bad generators—at scale, for this task.

What About Definitions?

Definitions are the layer where LLMs come closest to being a substitute. You could brute-force 400K API calls to generate definitions in your style. But there are catches:

For associations and vocabulary rankings, there’s no LLM shortcut. For definitions, there’s a hard way that might work—but you’d still need the word list and difficulty rankings to start from.

How We Built It

We spent a decade building the graph the hard way:

Built from 1.5 million words and 100 million connections. Shipped as a curated 400K-word graph with ~40M connections—every plausible word a player would use, without noise. Learn more → or see licensing options →