Custom Puzzle Data

You define your game mechanics. We generate data that works for those mechanics.

You
Define Your Game
mechanics, rules, content needs
We
Build Your Bundle
compact, documented, ready to ship
Ship
All Your Levels
hundreds or thousands

Every word game is a data problem wearing a different costume.

Scrabble needs to answer one question: “Is this a valid word?” A flat list of terms suffices. Wordle clones need a shorter list of familiar five-letter words—and someone has to decide whether SLAVE or ABORT might trend on Twitter for the wrong reasons. NYT Connections needs four categories with carefully calibrated decoys—words that look like they belong together but don’t. Semantic pathfinding games need pre-computed routes through meaning-space, because you can’t brute-force hundreds of millions of permutations in real time.

Below are four examples of puzzle formats we’ve built from our licensable data—but we design data for YOUR game mechanics, not just these. Tell us how your game works, and we’ll generate the data structures it needs.

Anagram Puzzles

For letter-tile games, anagram challenges, and Wordle-style spelling games. We provide a set of letter tiles, target words with optional semantic clues, and a complete list of valid anagrams and other letter combinations—everything a player might legitimately form from the available letters. Build your spelling game with puzzle data from Linguabase.

Puzzle Generation

LETTER SET
7 letters (configurable)
VOCABULARY SCAN
for each word: does it use only these letters?
SPELLING RULES
can letter repeat consecutively? DEEP needs 2 E’s to alternate—LEVEL needs 2 L’s
TARGET SELECTION
from valid pool: length variety, edit distance ≥3, no shared stems, no containment

Clue Generation

TARGET WORD
e.g. CASTLE
SEMANTIC LOOKUP
pull related words from 40M-connection graph
MORPHOLOGICAL FILTER
reject same-stem relatives (CASTLES, CASTLED)
ETYMOLOGICAL FILTER
reject same-root relatives (CHÂTEAU shares Latin root)
SENSE DISTRIBUTION
spread across meanings if polysemous
HARD
Sample Data
Level 847 of 2,000
Letter Tiles
L I S T E N S
Target Words
LISTENS hear, audio, ears
TITLES book, movie, name
STINTS period, job, limit
LINEN fabric, sheets, cloth
NESTS home, build, tree
ISLES islands, tropical, water
Also Valid
ELITES ENLIST ENSILE INLETS ISLETS LISTEN SILENT TINSEL STIES TILES LINES LIENS INLET SITES
Easy Medium Hard

Pathfinding Puzzles

For semantic navigation games where players traverse meaning-space—connecting concepts through chains of associations. We provide validated origin-target pairs solvable in exactly 3 moves (4 hops through meaning-space), with multiple solution paths, precalculated hints, and backward “convergence” clues that tell players when they’re getting close. Build your exploration game with puzzle data from Linguabase.

Puzzle Generation

ORIGIN POOL
819 curated origin words
BRUTE-FORCE ENUMERATION
enumerate ALL 4-hop paths to every reachable candidate target
PATH COUNT FILTER
keep 16–43 paths (3–4 advancing choices per hop)
CHOKE POINT REJECTION
reject if one hop1 word dominates >60% of routes
SHORTCUT FILTER
reject if 2-hop or 3-hop paths exist (too close semantically)

Hint Generation

VERIFIED PATHS
only paths that actually reach target
FORWARD EXTRACTION
for each hop: collect words from all solutions
FREQUENCY RANKING
hints appearing in more paths ranked higher (more downstream options)
BACKWARD TRAVERSAL
from TARGET back: find all words 1–3 hops away
CONVERGENCE WORDS
“am I close?” clues shared across puzzles with same target

Why hints must be precalculated: Finding valid hints requires enumerating all paths—not just one—to guarantee every hint actually reaches the target. With 15 words visible at each hop, that’s 50,000+ candidate paths per puzzle. Ranking hints by how many paths use them (more paths = more downstream options) requires global knowledge. And compressing hint data for mobile delivery (74% size reduction via ID encoding) requires knowing all hints upfront. Runtime computation would add seconds of latency; precalculation makes hints instant and guaranteed correct.

MEDIUM
Sample Data
Level 89 of 779
sugar → peace
Path A
sugar sweet pleasant harmony peace
Path B
sugar cane plant olive peace
Path C
sugar dissolve solution resolution peace
Hints
sweet cane dissolve (every hint verified to reach target—no dead ends)
Converge
harmony calm treaty (1-hop from target)

Themed Categories

For NYT Connections-style games with intentional decoys—words that tempt players into wrong groupings, where ambiguity is the challenge. Or for other category games that need mutually exclusive pools where no word could plausibly belong to two groups. Build your categorization game with puzzle data from Linguabase.

ANCHOR WORD
word with rich semantic profile
ASSOCIATION PULL
select 8 strongest associations (≤7 letters each)
THEMATIC ISOLATION CHECK
does this anchor’s FULL neighborhood overlap any existing output words? → reject
REPEAT ×8
all 8 groups must have zero semantic bleed between them
HARD
Sample Data
Level 234 of 1,500
Theme 1 Climate
monsoon carbon ozone warming ecology polar
Theme 2 Pressed
coerced ironed cider crushed mashed creased
Theme 3 Keyboard
shift return escape control space tab
Easy Medium Hard

Clue-Based Puzzles

For crossword-style games and semantic guessing games—definitions that work as hints, plus related words for crafting clues that don’t give away the answer morphologically. Players identify a target word from semantic clues, but there might be decoys that match some (not all!) of the clues. Build your word-guessing game with puzzle data from Linguabase.

Clue Selection

ANSWER WORD
polysemous (multiple dictionary senses)
SENSE CATALOG
identify distinct meanings (SEAL: animal, wax, military)
ASSOCIATION PULL
for each sense: top associations from graph
MORPHOLOGICAL FILTER
reject same-stem, same-root relatives (SEAL clues can’t include SEALS, SEALING)
4 CLUES
one strong association per sense

Decoy Selection

4 CLUES
given these 4 clues
CANDIDATE SCAN
score each word against all 4 clues
ZERO DETECTION
which clues have ZERO connection? (no graph link)
MATCH PATTERN
record which clues each decoy matches: “0,1” or “2,3” etc.
SEPARATION SCORING
answer_total − best_decoy must exceed threshold

Why decoys work: Each decoy must match 2–3 of the 4 clues—enough to seem plausible—but have at least one “zero” (a clue it cannot possibly connect to). Players eliminate decoys by finding the impossible connection. The answer wins by having no zeros: it connects to all 4 clues through different senses of the word.

HARD
Sample Data
Level 512 of 3,000
Answer
army
Clues
platoon reconnaissance ambush barracks
Decoys
battalion tactics squadron
Easy Medium Hard

What You Get

We deliver production-ready puzzle data for all your levels—hundreds or thousands, depending on your game’s needs. The data is generated exclusively for your game, unique to your studio. Every puzzle draws from Linguabase’s complete vocabulary stack: vocabulary rankings, semantic associations, sense clouds, and more.

The four formats above are examples. Your game has its own mechanics—we’ll design a data structure that fits, exclusive to you.

Get in Touch

We can generate test datasets for creative development and playtesting—small batches to validate that your mechanics work. When you’re ready, we produce data for thousands of production levels.

Benefit from a decade of experience with the twists, turns, and edge cases of making word puzzles about meaning.

Email: linguabase@idea.org

Describe your game mechanics—how players interact, what constitutes a “level,” what data you need at runtime—and we’ll design a puzzle data format that plugs directly into your game.