Free sources won’t get you far enough.
Several free language resources exist. They’re fine for academic research or simple lookups. But if you need to traverse meaning-space — for games, AI reasoning, or semantic exploration — free sources fundamentally lack the architecture.
Wiktionary is actually superb — by the late 2010s it surpassed commercial dictionaries in coverage, and we use it ourselves. But it’s fundamentally designed as a multilingual dictionary, not a semantic exploration tool.
| Pros | Cons |
|---|---|
|
|
WordNet began in 1985 at Princeton under George Miller as a psycholinguistic experiment — a test of how the human mental lexicon is organized. Hand-crafted by graduate students and post-docs, it groups words into “synsets” (synonym sets) representing single concepts, connected by hierarchical relations like hypernymy (“dog IS-A animal”). The design was deliberately scoped: open-class words only (nouns, verbs, adjectives, adverbs), excluding function words, proper nouns, and multi-word expressions. Current scale: ~155K words organized into 117K synsets with 206K word-sense pairs.
The core limitation isn’t quality — it’s architecture. WordNet is fundamentally about clustering words around meanings, not about free-association outward from a word. It answers “what other words share this meaning?” not “what does this word make you think of?” It lacks any experiential color — no gestalt associations, no weighted connections, no sense of which relationships are stronger than others.
| Pros | Cons |
|---|---|
|
|
ConceptNet emerged from MIT Media Lab’s Open Mind Common Sense project (1999), founded by Push Singh and Marvin Minsky. The insight was that AI systems lacked everyday knowledge humans take for granted — things like “ice is cold” or “people eat when hungry.” The approach was bottom-up crowdsourcing: ordinary people contributed statements in natural language, parsed into structured relations. By ConceptNet 5.x, it had grown to ~21 million edges connecting 8 million nodes across 300+ languages, integrating WordNet, Wiktionary, and other sources.
The core limitation is granularity and focus. ConceptNet’s relations are coarse-grained and commonsense-focused — great for “dogs are animals” but not for the fine-grained semantic distinctions that make word puzzles interesting. Its crowdsourced origins also mean significant noise. Large language models have now largely absorbed this kind of commonsense knowledge, making ConceptNet less central to current research.
| Pros | Cons |
|---|---|
|
|
DBpedia (2007) extracts structured data from Wikipedia infoboxes — it’s derived, not primary. No one contributes to DBpedia directly; it parses whatever Wikipedia editors put in infoboxes. Wikidata (2012) is the opposite: a primary, community-curated knowledge base where humans and bots enter statements directly. Originally created to centralize Wikipedia’s interlanguage links, Wikidata has grown into the world’s largest open knowledge graph with 100+ million items.
Both are entity databases, not lexical resources. They capture factual relationships — born-in, instance-of, part-of — not associative or conceptual proximity. Wikidata will tell you “cat is-a mammal”; it won’t tell you that “cat” evokes “curiosity” and “nine lives.” Different problem entirely.
| Pros | Cons |
|---|---|
|
|
| Feature | Wiktionary | WordNet | ConceptNet | Linguabase |
|---|---|---|---|---|
| Relationship structure | None | Typed (hypernym, meronym) | Typed (36 relations) | Weighted by strength |
| Relationship weights | No | No | Yes (unreliable) | Yes (curated) |
| Data quality | Variable | High (dated) | Low-medium | High |
| Graph operations | No | Limited | Yes | Yes |
| Commercial use | CC license | Unclear | CC license | Licensed |
| Active maintenance | Yes (community) | Minimal | Limited | Yes |
| Production-ready API | No | No | Yes (limited) | Yes |
| Sense-balanced coverage | No | No | No | Yes |
| Directional weights | No | No | No | Yes |
| False cognate removal | No | No | No | 291K audited |
| Gestalt/experiential | No | No | Partial | Yes |
| Vocabulary scale | 7M (raw) | 155K | ~300K | 1.5M (400K prod) |
Even if you combine all free sources, you still won’t have:
These aren’t cleanup problems. They’re architectural gaps. Free sources were built for lookup, not exploration.
Beyond capability gaps, free sources also require:
Linguabase: Over a decade of engineering already done. One clean API. Production-tested.