The complete vocabulary stack for word game publishers.
400K
Words
400K
Definitions
~40M
Connections
The Data Behind Word Games
Whether your game rewards spelling, solving, or associative leaps, it runs on lexical data. Word lists that won’t embarrass you. Definitions players actually understand. Semantic relationships that power hints, clues, and connections without giving the answer away morphologically.
Linguabase gives you the full vocabulary stack—structured, consistent, and built for game logic. One integration. Every word your players will ever need:
Vocabulary — 400K words with difficulty scores. Customizable several ways: tuned for different difficulty levels, avoiding vulgarities and offensive words, excluding proper nouns.
Definitions — Readable 2–3 sentence blurbs (~50–60 words) that cover all the facets of meaning. Not overly technical dictionary-style writing or fragments.
Related words — For BOAT, you get associations like “sail,” “vessel,” “oar”—hint words that don’t share letters or morphology with the answer.
Six Data Layers
Linguabase gives you six data layers. License what you need:
Vocabulary
400K words with difficulty scores—from everyday vocabulary to crossword-worthy rarities.
Definitions
400K readable paragraphs in flowing sentences. Useful as in-game clues or help text.
Content Filters
Two word lists you control. Hard-block list of purely offensive words. Soft-block list of words carrying unwanted innuendo.
Word Associations
~40 related words per entry, weighted by relationship strength. Each word decomposed into facets of meaning with related words for each.
Word Families
Morphological groupings: run → runs, running, ran, runner, runway, outrun.
Usage Examples
1.46M quotations written by humans with intent. Common words from famous literature; uncommon words from Wikipedia and open-access sources.
Delivered as files you can embed (TSV, SQLite, JSON) or query via API. See delivery options →
Why Not Just Use an LLM?
You could prompt an LLM for word associations or definitions. Superficially, it looks fine. But problems lurk under the surface:
Capitalization confusion — LLMs conflate “china” (porcelain) with “China” (country), “march” (walk) with “March” (month). This contaminates thousands of words in both associations and definitions.
Sense imbalance — “Elephant” has 9 distinct senses in Linguabase (anatomy, behavior, circus, heraldry, ivory, megafauna, size, symbolism, wildlife). LLMs typically cluster on just 2-3, missing associations like Republican, Ganesh, Dumbo, mahout, and proboscidean.
We use LLMs for validation, not generation—they’re better at confirming relationships than discovering them. How it works →What they miss →
Semantic Games: An Open Design Space
Word games have been stuck on spelling for a hundred years. Scrabble, Wordle, Wordscapes—great games, but the mechanic is always “arrange letters into words.” Meaning is irrelevant.
We think there’s a wide-open design space for games that navigate meaning instead of letters. Taboo is a popular party game. The New York Times has a hit categorization puzzler. We built In Other Words, a pathfinding game where players navigate drifting word clouds to build bridges between concepts.
We analyzed the Linguabase graph: 76% of English word pairs connect in 7 hops or fewer. Average path length is 6.43 steps. Meaning-space is more navigable than people realize. “Sugar” to “peace” feels impossible, but the path exists: sugar → sweet → pleasant → calm → peace.
Whether you’re building the next Wordle or exploring semantic mechanics nobody’s tried yet, Linguabase provides the data layer.