Compare - Linguabase

Free Sources

Free word lists (SCOWL, ENABLE, TWL) provide validated spellings—176K to 267K words depending on the list. Academic resources like WordNet and Wiktionary add definitions and taxonomies. Commercial publishers like Oxford provide ~600K synonyms. Few include difficulty rankings, content filters, or weighted semantic connections.

Word lists with no definitions; definitions with no relationships; no short clues for gameplay
Frequency data that doesn’t map to player-perceived difficulty
No content filters, no network operations
~600K synonym pairs (Oxford) vs. 100M+ semantic connections—associations, categories, definitions, filters (Linguabase)

vs. Free Sources →

Handcrafting

A 4×4 categorization puzzle takes Wyna Liu roughly 2.5 hours. A 6×8 takes half a day. The workload scales nonlinearly: every additional category multiplies cross-checks against every other. Wordle’s curated answer list of ~2,300 words started recycling in February 2026.

Per-puzzle cost rises over time as constraint space tightens
Constructors unconsciously repeat themes and vocabulary
Even experts exhaust their best material
Every word’s multiple meanings must be cross-checked against every category

The labor of puzzle construction →

AI Generation

Ask an LLM for 50 word-association puzzles and the first five look great. By puzzle 20, the same animals, the same colors, the same “things that are round.” Each prompt is solid on its own—but there’s no memory across the set.

Dominant associations repeat across levels
No memory between prompts—can’t track what’s been used
Can’t guarantee words belong to exactly one group
No difficulty calibration against a fixed scale

vs. LLMs →

The Core Problem

Generating thousands of non-repeating puzzles requires two things simultaneously: deep language data and a complete picture across the full vocabulary—knowing which associations have been used, which senses are underrepresented, which difficulty bands need more content. That’s a data infrastructure problem—not a creative one.

It’s not an accident that the publicly available alternatives are thin. The legal and economic dynamics of language data create a market where the best work stays invisible. Here’s why the best language data is not on GitHub.

What’s Actually Different

What others provide

Word definitions (Oxford, Wiktionary) — Linguabase vs. Oxford
Synonym lookup (Oxford, WordNet)
Word associations (LLMs, ConceptNet)
Taxonomic hierarchies (WordNet)

What Linguabase adds

100M+ weighted connections with strength rankings
Difficulty rankings across all 400K words
Sense-balanced coverage (all meanings, not just dominant)
291K false cognates removed
Content filters (hard block + soft block)
200K multi-word expressions
Network operations (pathfinding, convergence, distance)
Experiential associations (gestalt) — crisis → siren, wedding → white

Linguabase vs. the Alternatives

The Core Problem

What’s Actually Different

What others provide

What Linguabase adds