Puzzle Formats - Linguabase

You define your game mechanics. We generate data that works for those mechanics.

You

Define Your Game

mechanics, rules, content needs

↓

Build Your Bundle

compact, documented, ready to ship

↓

Ship

All Your Levels

hundreds or thousands

Every word game is a data problem wearing a different costume.

Scrabble needs to answer one question: “Is this a valid word?” A flat list of terms suffices. Wordle clones need a shorter list of familiar five-letter words—and someone has to decide whether SLAVE or ABORT might trend on social media for the wrong reasons. NYT Connections needs four categories with carefully calibrated decoys—words that look like they belong together but don’t. Semantic pathfinding games need pre-computed routes through meaning-space, because you can’t brute-force hundreds of millions of permutations in real time.

Below are four examples of puzzle formats we’ve built from our licensable data—but we craft puzzle data for YOUR game mechanics, not just these.

Download Sample Puzzles

25 complete puzzles—10 categories, 10 words each, with difficulty scores per word. Raw algorithmic output, no human editing.

↓ TSV (27 KB) ↓ JSON (136 KB)

Anagram Puzzles

For letter-tile games, anagram challenges, and Wordle-style spelling games. We provide a set of letter tiles, target words with optional semantic clues, and a complete list of valid anagrams and other letter combinations—everything a player might legitimately form from the available letters.

LETTER SET

7 letters (configurable)

→

VOCABULARY SCAN

for each word: does it use only these letters?

→

SPELLING RULES

• letter repetition? (DEEP, LEVEL)
• smushed compounds? (SUNFLOWER)
• minimum length?

→

TARGET SELECTION

• length variety
• edit distance ≥3
• no shared stems
• no containment

TARGET WORD

e.g. CASTLE

→

SEMANTIC LOOKUP

pull related words from 100M-connection network

→

MORPHOLOGICAL FILTER

reject same-stem relatives (CASTLES, CASTLED)

→

ETYMOLOGICAL FILTER

reject same-root relatives (CHÂTEAU shares Latin root)

→

SENSE DISTRIBUTION

spread across meanings if polysemous

HARD

Sample Data
Level 847 of 2,000

Letter Tiles

L I S T E N S

Target Words

LISTENS hear, audio, ears

TITLES book, movie, name

STINTS period, job, limit

LINEN fabric, sheets, cloth

NESTS home, build, tree

ISLES islands, tropical, water

Also Valid

ELITES ENLIST ENSILE INLETS ISLETS LISTEN SILENT TINSEL STIES TILES LINES LIENS INLET SITES …

Easy Medium Hard

Pathfinding Puzzles

For semantic navigation games where players traverse meaning-space—connecting concepts through chains of associations. We provide validated origin-target pairs solvable in exactly 3 moves (4 hops through meaning-space), with multiple solution paths, precalculated hints, and backward “convergence” clues that tell players when they’re getting close.

ORIGIN POOL

819 curated origin words

→

BRUTE-FORCE ENUMERATION

enumerate ALL 4-hop paths to every reachable candidate target

→

PATH COUNT FILTER

keep 16–43 paths (3–4 advancing choices per hop)

→

CHOKE POINT REJECTION

reject if one hop1 word dominates >60% of routes

→

SHORTCUT FILTER

reject if 2-hop or 3-hop paths exist (too close semantically)

VERIFIED PATHS

only paths that actually reach target

→

FORWARD EXTRACTION

for each hop: collect words from all solutions

→

FREQUENCY RANKING

hints appearing in more paths ranked higher (more downstream options)

→

BACKWARD TRAVERSAL

from TARGET back: find all words 1–3 hops away

→

CONVERGENCE WORDS

“am I close?” clues shared across puzzles with same target

Why hints must be precalculated: Finding valid hints requires enumerating all paths—not just one—to guarantee every hint actually reaches the target. With 15 words visible at each hop, that’s 50,000+ candidate paths per puzzle. Ranking hints by how many paths use them (more paths = more downstream options) requires global knowledge. And compressing hint data for mobile delivery (74% size reduction via ID encoding) requires knowing all hints upfront. Runtime computation would add seconds of latency; precalculation makes hints instant and guaranteed correct.

MEDIUM

Sample Data
Level 89 of 779

sugar → peace

Path A

sugar → sweet → pleasant → harmony → peace

Path B

sugar → cane → plant → olive → peace

Path C

sugar → dissolve → solution → resolution → peace

Hints

sweet cane dissolve (every hint verified to reach target—no dead ends)

Converge

harmony calm treaty (1-hop from target)

Themed Categories

For NYT Connections-style games with intentional decoys—words that tempt players into wrong groupings, where ambiguity is the challenge. Or for other category games that need mutually exclusive pools where no word could plausibly belong to two groups.

ANCHOR WORD

word with rich semantic profile

→

ASSOCIATION PULL

select 8 strongest associations (≤7 letters each)

→

THEMATIC ISOLATION CHECK

does this anchor’s FULL neighborhood overlap any existing output words? → reject

→

REPEAT ×8

all 8 groups must have zero semantic bleed between them

HARD

Sample Data
Level 234 of 1,500

Theme 1 Climate

monsoon carbon ozone warming ecology polar

Theme 2 Pressed

coerced ironed cider crushed mashed creased

Theme 3 Keyboard

shift return escape control space tab

Easy Medium Hard

Clue-Based Puzzles

For crossword-style games and semantic guessing games—short clues designed for gameplay, plus related words that don’t give away the answer morphologically. Players identify a target word from semantic clues, but there might be decoys that match some (not all!) of the clues.

ANSWER WORD

polysemous (multiple dictionary senses)

→

SENSE CATALOG

identify distinct meanings (SEAL: animal, wax, military)

→

ASSOCIATION PULL

for each sense: top associations from network

→

MORPHOLOGICAL FILTER

reject same-stem, same-root relatives (SEAL clues can’t include SEALS, SEALING)

→

4 CLUES

one strong association per sense

4 CLUES

given these 4 clues

→

CANDIDATE SCAN

score each word against all 4 clues

→

ZERO DETECTION

which clues have ZERO connection? (no network link)

→

MATCH PATTERN

record which clues each decoy matches: “0,1” or “2,3” etc.

→

SEPARATION SCORING

answer_total − best_decoy must exceed threshold

Why decoys work: Each decoy must match 2–3 of the 4 clues—enough to seem plausible—but have at least one “zero” (a clue it cannot possibly connect to). Players eliminate decoys by finding the impossible connection. The answer wins by having no zeros: it connects to all 4 clues through different senses of the word.

What You Get

We deliver production-ready puzzle data for all your levels—hundreds or thousands, depending on your game’s needs. The data is generated exclusively for your game, unique to your studio. Every puzzle draws from Linguabase’s complete vocabulary stack: vocabulary rankings, semantic associations, sense clouds, and more.

The four formats above are examples. Your game has its own mechanics—we’ll design a data structure that fits, exclusive to you.

Get in Touch

We can generate test datasets for creative development and playtesting—small batches to validate that your mechanics work. When you’re ready, we produce data for thousands of production levels.

We’ve been generating word puzzles about meaning for over a decade. The edge cases are where all the work is.

Describe your game mechanics—how players interact, what constitutes a “level,” what data you need at runtime—and we’ll design a puzzle data format that plugs directly into your game.

Custom Puzzle Data