Linguabase vs. Handcrafting

Word association games support a wide range of mechanics, but the most complex puzzles are not feasibly built by hand or by LLM. The workload scales nonlinearly with the number of categories and words per category.

2 × 2

seconds

3 × 3

minutes

NYT Connections

4 × 4

~2 hours

6 × 8

half a day

8 × 15

days

Every cell is a word. Every row is a category. Every category must not overlap with any other.

A trivial 4×4 puzzle with fruit, animals, cities, and sports takes a few minutes. But the actually-interesting 4×4 puzzles—the ones millions of people play daily—take a minimum of two hours each. And per-puzzle cost rises over time: each new puzzle must be checked against every previous puzzle for reused words and reused groupings that dedicated players will resent.

One person, one notebook, 3,600 categories

Wyna Liu holding her Connections notebook

Wyna Liu, the sole editor and constructor of NYT Connections. Photo: The Atlantic.

Every day, a single person constructs the NYT Connections puzzle from scratch—and by her own account, the work never gets easier. Wyna Liu, the sole editor and architect of the game that millions play daily, has described a creative process defined by solitary brainstorming, growing constraints, and the ever-present pressure to find fresh categories in a landscape she has already mined hundreds of times.

Each board demands sixteen words that sort cleanly into exactly four groups of four, with enough overlap between groups to create genuine misdirection—all while avoiding the categories, groupings, and tricks she’s already published. Her boss, NYT Games editorial director Everdeen Mason, has called it the “heaviest editorial lift” at the Times. Each board takes roughly two hours. She builds seven every week.

The process is not mechanical. As she told Newsweek, “each board has to be made from scratch.” She starts from a spiral notebook she carries everywhere. She pulls a seed idea, writes out the words, and starts spinning. A single seed might branch into a dozen possible categories. She’s described the joy of this uncertainty: “I start with like a seed. And then I’m just like, ‘Oh, what does this word do?’ And then maybe like a dozen categories of possibilities and then you kind of see if any of those words are playing off each other, and the brain builds from that.” But the joy coexists with difficulty: “I usually don’t know where a board is going to go. It really does take a long time to fudge things and try different angles.”

Close-up of Wyna Liu’s notebook showing handwritten word clusters for potential Connections categories

A page from Liu’s notebook. Each cluster is a candidate category built from one-word movie titles: NYC neighborhoods (DUMBO, MANHATTAN, CHINATOWN, WALL STREET), sharp objects (BLADE, ROPE, SAW), green characters (SHREK, HULK, RANGO). Photo: NYU Tisch.

She’s articulated what the work demands cognitively: “When you’re solving a puzzle, you’re trying to get into the puzzle-maker’s head and understand what they’re thinking. Creating a puzzle, though, you’re almost trying to live inside the solver’s head and imagine it from their perspective. They’re two sides of the same process; it’s a very process-driven thing.”

Most of her candidate categories never make it into a puzzle: too obscure, too close to a past board, too easy to solve. The notebook keeps growing, but the space of usable categories does not grow at the same rate. Over 900 published boards means roughly 3,600 individual categories, and the constraint space tightens with every one. She is already reusing words and must recontextualize them each time. As she told Slate: “The constraints help move it forward in a way. If I reuse a word or something, I try to contextualize it differently. So it really does push editorially what the parameters are.”

“It’s just me. So, I mainly work from home. So a lot of it is just like the echo chamber of my brain.”

— Wyna Liu

The cognitive toll compounds alongside the constraint space. When Liu started making Connections, she found herself scanning everything around her for potential categories—every menu, every sign, every conversation. Eventually she had to teach herself to stop. She now rations her recreational puzzle-solving, saving it for what she’s called “special occasions.” She told The Atlantic: “I don’t want to burn out on puzzles.”

Puzzle construction is grueling, solitary, creative labor that compounds in difficulty over time. Liu is one of the best in the world at it. She has constructed more category puzzles than almost anyone alive, and the portrait that emerges from her own interviews is of a process that was never easy and gets harder every day—not because she’s losing skill, but because the math works against any single human mind producing non-repeating content indefinitely.

Every word game hits a ceiling

Liu’s problem is acute because she builds categorization puzzles—but no word game escapes the arithmetic. The solution space is always finite, and it’s smaller than it looks.

Josh Wardle built Wordle for his partner during the pandemic. His partner Palak Shah manually reviewed the entire 12,000-word dictionary—narrowing it to roughly 2,300 common answers.

Wordle is built on five-letter words. Once constrained to answers familiar enough that players won’t feel cheated, the pool shrank to roughly 2,300. That gave the game about six years of daily play. Tracy Bennett, the Times’ first dedicated Wordle editor, inherited the countdown: “There are enough five-letter words to last until 2027.” By late 2024, with 600–700 unused answers remaining, she went public about the options: recycling old words, allowing plurals, or allowing past tenses. On February 2, 2026, Wordle began recycling. The first repeated answer was CIGAR—the very first Wordle answer from June 2021.

The print Spelling Bee in the New York Times Magazine

The print Spelling Bee as it appeared in the New York Times Magazine, constructed by Frank Longo from 2015 until Sam Ezersky took over the digital version in 2018.

The Spelling Bee follows the same arc. Frank Longo created the print version for the Times Magazine in 2015. His method: run a program to find valid letter sets, then review the word lists manually. In the Spelling Bee, a pangram—a word using all seven of the puzzle’s letters—is the crown jewel players hunt for. Early puzzles featured long, satisfying pangrams like PRIZEWINNER. After hundreds of puzzles, the best seeds are gone and construction relies on shorter, less satisfying starting points.

Sam Ezersky, who took over the digital version in 2018, has described “dumpster diving” through less obvious pangrams as the good ones get used up. His most dramatic response to the shrinking pool was a self-imposed constraint: excluding the letter S entirely, because plurals make the game trivially easy. That rule held for years—until March 2025, when S finally appeared. The decision wasn’t a change of philosophy; it was a business need. Releasing the constraint opened new territory, buying the game more runway.

The pattern is the same in every case. The best material gets used first. What remains is progressively harder to build around. And a human constructor’s unconscious tendencies—repeated themes, favored vocabulary, structurally similar groupings—compound the problem even when the raw material hasn’t run out.

Every meaning multiplies the work

Words with multiple meanings—including homonyms, where meanings are completely unrelated—multiply the work of puzzle construction. Every additional sense multiplies the cross-checks needed to ensure a word doesn’t bleed into the wrong category.

Polysemy in practice: “bow”

“Bow could be something you tie, a bow could be part of a violin, a bow could be part of a ship, or it could be a bow—a gesture of respect.”

— Wyna Liu, The Atlantic, 2024

Four senses of one word. Each belongs to a different category. Each creates a different trap.

In categorization games, the same problem applies between groups. The BBC’s Only Connect deliberately exploits this: a classic Connecting Wall included SCREWDRIVER and GIMLET—both well-known cocktails and common tools. The deliberate overlap creates a fixation effect: once contestants lock onto a plausible-but-wrong grouping, escaping the mental frame is extremely difficult.

This is what makes the best puzzles satisfying. But accidental overlap breaks them. A category of “soft things” adjacent to animals can’t put “bunny” in play—one player thinks bunnies are soft, another thinks bunnies are animals. Unless the game allows both to be correct, the puzzle is broken, not the player.

For a constructor, polysemy means holding every sense of every word in mind simultaneously, then verifying that no unintended connections exist across categories. At 4×4, this is hard. At 6×8, it’s prohibitive. At 8×15, no single person can hold all the cross-connections in mind.

Fair puzzles are expensive to guarantee

“We must expect the composer to play tricks, but we shall insist that he play fair.”

— Afrit, Armchair Crosswords, 1949

Josh Wardle: “If the first time you play Wordle, the answer is a word you’d never heard of, I think you would feel cheated.”

Tracy Bennett: “We get more complaints about broken streaks than anything else.” She goes further on sensitivity: “I care if a word is derogatory in a tertiary meaning, and if somebody has heard that word hurled at them, then I know I don’t want to use that, even if it affects 2 percent of solvers.” Bennett also struck words with unsettling associations—“slave,” “lynch”—and current-events-sensitive terms like “fetus” after the Dobbs leak. Fairness isn’t just about solvability; it’s about what a player encounters on the way.

Cryptic crosswords have the longest formal tradition of codified fairness. In 1949, the setter Afrit articulated the foundational principle: “you need not mean what you say, but you must say what you mean.” Twenty years later, Ximenes codified a stricter rule: every word in a cryptic clue must serve either the definition or the wordplay. No filler, no misdirection through vagueness—only misdirection through precision.

The common thread across all these formats: guaranteeing fairness means tracing every way a word might connect—not just the intended interpretation, but every plausible alternative a player might see. You need to know a word is derogatory in a tertiary meaning. You need to know “bunny” reads as an animal to one player and as soft to another. You need blocklists with tiers: hard blocks for words you’d never show a player, and soft blocks for words you wouldn’t use as answers but might allow as guesses. This editorial layer has to be applied across the full vocabulary—not just the words a constructor happens to use this week, but every word a player might encounter at any level of the game.

The data has always been the hard part

Alfred Butts at the Scrabble tile factory, 1985

Alfred Mosher Butts invented Criss-Crosswords in 1938 (later renamed Scrabble). Here he is 47 years later, visiting the factory where Scrabble tiles were manufactured.

Scrabble’s original word list was the newspaper.

In the 1930s, unemployed architect Alfred Mosher Butts hand-counted letter frequencies across the front pages of the New York Times, the Herald Tribune, and the Saturday Evening Post. He mapped frequency to tile count (E gets 12 tiles, Q and Z get 1) and inversely to point value. His frequency analysis yielded just four S tiles—a constraint that incidentally limits easy pluralization. That original 100-tile distribution has never been changed across more than 150 million Scrabble sets sold.

The pattern holds across the history of word games. The labor is always in the data: which words to include, how they relate, what players will find fair or unfair, easy or hard. The game mechanics are the visible part. The invisible part—the thousands of hours of curation, filtering, and calibration—is what determines whether players come back.

What Linguabase provides

Linguabase can’t replace Wyna Liu. It doesn’t contain her tight thematic lists about superheroes or New York neighborhoods, and it doesn’t do her letter play. What it provides is the underlying ability to scale—to generate word-relation puzzles that fit a publisher’s mechanics, whether those mechanics demand a single solution or many, whether the game is anchored on free-thinking word association or cleanly bounded categories.

Puzzles are built to your mechanics and game design. That typically means specifying:

How many solutions per puzzle?
How much conceptual overlap—mutually exclusive topics, decoys and deliberate misdirection, or precisely overlapping categories?
How many topics per puzzle, and how many words per topic?

All puzzles are encoded with vocabulary difficulty per word, enabling real-time adjustment to the puzzles you display.

Nearly half of the vocabulary consists of multi-word expressions—phrases like “carbon footprint,” “jazz hands,” “red herring”—which expand the pool of playable terms without requiring obscure vocabulary. The system also works around the repetition that plagues both human brainstorming and LLM generation, delivering puzzles that are consistently varied, level after level.

See pricing →