License the Data

Words, links, meaning. License what you need.

400K
Words
400K
Definitions
~40M
Connections

Ship word data that works offline, loads fast, and fits in your app bundle. We built Linguabase for our own games—In Other Words (launched 2025) and OtherWordly (coming soon)—so every layer is optimized for mobile deployment: clean formats, no server round-trips required, and sizes you can actually ship. (Need pre-generated puzzles instead of raw data? See Puzzle Licensing.)

Layer What It Contains Scale Size
1 Vocabulary Validated words with difficulty scores—from everyday vocabulary to crossword-worthy rarities. Filter by obscurity, include or exclude proper nouns. 400K–1.5M words 4 MB
2 Definitions Readable 2–3 sentence paragraphs in flowing sentences covering all meanings naturally. Coverage for nearly all vocabulary 97 MB
3 Content Filters Hard block (offensive) + soft block (suggestive) word lists for content control. ~6K terms 56 KB
4 Word Associations ~40 related words per entry, weighted by connection strength. Includes sense-level pools organized by meaning. ~40M connections, scales with vocabulary 64–240 MB
5 Word Families Morphological + etymological groupings (elephant → elephants, elephantine). Coverage for nearly all vocabulary 22 MB
6 Usage Examples Illustrative quotations—common words from literature, uncommon words from Wikipedia and open-access sources. 1.46M quotes 260 MB

Sizes shown are what we ship in-bundle for our own apps—compressed and with DRM applied. Many tables use 3-byte word IDs instead of strings, which keeps file sizes small while preserving fast lookups.

Ready to Integrate

Embed as files, query via API, or get custom exports. For game studios that want turnkey content, we also offer ready-made puzzle data built on these layers.

Vocabulary Size: Your Choice

There’s no precise count for “words in English”—variations, compounds, and proper nouns blur every boundary. And ranking words by “importance” is surprisingly hard: pure frequency metrics rank “the” highest while missing that “red panda” is widely understood.

The Linguabase word ranking is an effective proxy for what words are known to a broad user base, including English language learners. The highest ranks align with vocabulary standards like the Oxford 3000 and CEFR levels—but enhanced with common derivatives (both “actor” and “actors”). From there it extends through specialized jargon down to noise.

Our recommendation: 400K words. This covers every plausible word a user would want to interact with or see in a game. But we deliver à la carte:

Vocabulary Use Case Sample Words at Threshold
~100K Abridged dictionary scope conspecific, impeccability, shalwar
~400K (recommended) Full production coverage carboxylesterase, soft butch, naval strategies
~600K Inclusive with more noise divinas, benefactor’s, rock n’ roll
1.5M Research/completeness selenocysteines, straticulate, clobenpropit

The ranking includes commonly known proper nouns (Paris, Einstein, Toyota). If your application requires a no-proper-noun rule—like classic Scrabble—we can filter accordingly.

Sample Data Format

Vocabulary (words with difficulty scores):

Word Rank Tier
elephant 4,523 common
pachyderm 89,234 educated
Elephas 312,456 technical

Word Associations:

Headword Target Weight
elephant trunk 5.0
elephant pachyderm 5.0
elephant mammoth 4.2
elephant savanna 3.8
elephant Ganesh 2.1
elephant memory 1.8
elephant Republican 0.9

Sense Clouds—associations grouped by meaning:

Word Sense Related Terms
cloud weather sky, rain, cumulus, nimbus, overcast, storm…
cloud computing server, upload, storage, streaming, AWS…
cloud figurative vague, obscure, nebulous, hazy, uncertain…

Actual schema varies by delivery format. We’ll work with you on integration.

What You Can Build

Or skip building entirely—we can generate production-ready puzzle data for your game mechanics directly.

Delivery Options

Format Best For
TSV files Embedding in apps, offline use, full control
SQLite database Local querying, mobile apps, simple integration
JSON export Web applications, JavaScript environments
API access Server-side integration, real-time queries (optional add-on)
Custom format Whatever your stack needs—we’re flexible

Typical Questions

Can I see sample data before licensing?

Yes. Contact us and we’ll send evaluation samples for your specific use case.

What about updates?

Linguabase is actively maintained—new vocabulary is added as words enter mainstream usage, connection weights are refined, and ongoing auditing catches edge cases. License includes current data with update arrangements available.

Do you provide integration support?

Yes. We offer consulting on graph operations (pathfinding, distance calculation) and can help with integration into your stack.

What’s NOT included?

Get in Touch

Linguabase is available for licensing to word game studios, AI/LLM companies, educational technology developers, and productivity tools. Game studios looking for turnkey content can also explore custom puzzle generation.

Email: linguabase@idea.org

Tell us about your use case and we’ll send relevant samples and discuss licensing options.

Not Our Target Market

If you need:

We built Linguabase for exploration and games, not reference lookups.

About IDEA.org

Linguabase is developed by IDEA.org, a research organization focused on language, games, and education. We’ve been building language data since 2011 and currently publish word games including “In Other Words” (iOS).