Tens of millions of possible phrases, a new request on every keystroke, and a hard ceiling: ranked suggestions back in under 100 milliseconds. That makes typeahead a latency problem wearing a search costume. The core artifact is a trie with the top-K completions precomputed at every node — so a lookup is the length of the prefix, not a scan of the dictionary. Ranking and freshness are where the depth lives.
Every FAANG company runs a rubric. The dimensions are roughly the same; the weights differ by company and level. At senior+ the boxes-and-arrows are table stakes — what gets graded hardest is the quality of your decisions: the questions you asked first, the trade-offs you surfaced and defended, and the production reality you volunteered without being asked.
| Dimension | Weight | What earns the signal |
|---|---|---|
| Requirements & scoping | 10–15% | You scoped before drawing, asked enough to bound the problem, pinned the scale number, and stated assumptions out loud. |
| High-level architecture | 20–25% | The right components, a clear data flow, and a reason every box exists. The design satisfies each functional requirement. |
| Technical depth / deep dives | ~30% | You go three questions deep on the hard part without being rescued. This is where staff is won or lost. |
| Trade-offs & judgment | highest effective | Two viable options, what each costs, and a committed pick for this system. Simplicity over flash when flash isn't warranted. |
| Communication / driving | cross-cutting | You drive the 45 minutes; the interviewer never has to rescue you. You narrate, checkpoint, and narrow when the design sprawls. |
| Operational maturity | ↑ in 2026 | The newest weight: observability, rollout, failure modes, on-call reality — volunteered, not pried out. |
A solid design with reasonable trade-offs is a strong score for a mid-level candidate and a downlevel flag for staff. The questions can be identical; the depth expectation is not. As you climb, the balance tips from breadth toward depth, proactivity, and production reality.
You don't recite AWS — you anchor each decision to one of these. It signals you evaluate systems across competing concerns rather than optimizing one axis. Each pillar below is mapped to a move you can make in this exact design.
Measure suggestion quality, not just latency.
Don’t leak private or unsafe terms.
Serve stale before serving nothing.
Precompute so the query path does no ranking.
Cache the head of the distribution.
Rebuild, don’t thrash.
Typeahead is a latency problem wearing a search costume: a new request on every keystroke, ranked suggestions back in under 100 ms. The core artifact is a trie with the top-K completions precomputed at every node, so a lookup is the length of the prefix — not a scan of the dictionary. Build it offline; serve it online.
The simulation. Framing: search autocomplete — a dictionary of tens of millions of phrases, billions of queries/day, a suggestion request on every keystroke, and a hard p99 < 100ms end-to-end including network.
p99 < 100ms end to end — this is the defining constraint.“This is really a latency problem. Under 100ms including the round trip means I can’t do real work at query time — the ranking has to be precomputed and the lookup served from memory.”
“Two scoping questions: do we rank by popularity, and how fresh must trending be? If a viral query must show up in minutes, I need a streaming path on top of the batch-built index.”
“We’ll query the database for terms starting with the prefix.” A LIKE 'rea%' scan can’t deliver ranked top-K under 100ms at this scale — it signals you haven’t indexed for prefix search.
The “entity” here is the index itself. Interface is trivially small:
The data structure is a trie: each node is a character; following a prefix walks down the tree. The key move — store the top-K completions at each node, precomputed from historical frequency. Then a lookup is “walk the prefix, return the node’s cached list” — O(prefix length), no ranking at request time.
Reads dwarf writes by orders of magnitude, and each response is a handful of short strings. So the design optimizes purely for read latency and memory layout; write/update of the index is an offline concern that must never touch the serving path.
“I’ll precompute the top-K completions at every trie node. That trades extra memory and build time for a query that does zero ranking work — exactly the trade the 100ms budget demands.”
Query a database for terms matching the prefix, sort by popularity, return the top 10. It works functionally — and it fails the only requirement that matters: a prefix scan plus a sort, per keystroke, at billions of queries a day, will never hold 100ms. Name that and move on.
Build a trie over the phrase corpus. At each node, store the top-K completions of that prefix, ranked by frequency. A request walks the prefix (a few pointer hops) and returns the node’s cached list — served entirely from memory on the serving fleet.
An offline batch pipeline aggregates query logs into term frequencies and rebuilds the trie periodically, then hot-swaps it onto the serving nodes. The serving path never blocks on a build.
“The heavy trie is built offline from query logs and hot-swapped in. The serving fleet only ever reads an in-memory structure — building and serving are completely separate paths so a rebuild can’t hurt latency.”
Recomputing top-K at query time “for freshness.” Defensible only for a tiny corpus — say so. At this scale it blows the latency budget; precompute and refresh the index instead.
“Elasticsearch will handle it.” Naming a tool without the trie/top-K reasoning dodges the actual question — the interviewer wants the data structure and why it hits 100ms.
Two paths feed suggestions. The batch pipeline rebuilds the authoritative trie from full query logs on a schedule. A streaming pipeline tracks rising query frequencies in near real time — a count-min sketch is a cheap, approximate frequency counter ideal for this — and produces a small trending delta merged with the batch results. Viral terms surface in minutes without rebuilding tens of millions of nodes.
The full trie is large. Shard by leading characters (e.g., first one or two letters route to a shard), replicate each shard read-only for QPS and availability, and rebuild offline then hot-swap. Because it’s read-only at serve time, replication is trivial.
Typos (“recieve” → “receive”) need edit-distance matching, which is expensive. Bound it: only attempt fuzzy expansion when an exact prefix yields too few results, and cap the edit distance. Treat it as a fallback, not the default path, to protect the budget.
“I’d split freshness into a batch-built authoritative trie and a streaming trending delta using a count-min sketch. Most of the corpus is stable; only the rising tail needs to move fast, so I spend the streaming budget only where it pays off.”
“We’ll rank everything live with a fuzzy search on each keystroke.” Live fuzzy ranking per keystroke is the fastest way to blow past 100ms.
A breaking-news query starts trending. When does it show up?
Not on the daily rebuild — too slow. A streaming aggregator watches query frequencies in near real time using a count-min sketch, and the rising terms form a small trending delta. At query time I merge the node’s batch top-K with that delta, so a surging term appears within minutes without touching the full trie.
Doesn’t merging at query time cost you latency?
The delta is tiny — a small per-prefix overlay — so the merge is a quick combine of two short lists, not a re-rank of the corpus. The 100ms budget survives because the expensive ranking still happened offline; I’m only blending in a handful of fresh candidates.
Ship new tries by hot-swap with the old one kept warm for instant rollback. Canary ranking changes and watch click-through before fleet-wide rollout.
“With more time I’d add a personalization layer — blending a user’s own history into the suggestions — and deeper fuzzy matching. I scoped them out deliberately, not by accident.”
Interviewers push on latency and freshness. Commit to the trie, defend the precomputation, and protect the 100ms budget.
It can't deliver ranked top-K under 100ms at this scale — a prefix scan plus a sort per keystroke, billions of times a day, never holds the budget, and a leading-wildcard query can't even use a standard index well. A trie with precomputed top-K makes the lookup O(prefix) with no query-time ranking.
By historical frequency aggregated from query logs, precomputed into the top-K at each trie node. I blend in recency/trending from a streaming path so rising terms aren't buried by all-time popularity. Crucially, the ranking is precomputed — the query path just reads.
A near-real-time streaming aggregator tracks rising frequencies — a count-min sketch is a cheap approximate counter for this — and produces a small trending delta merged with the batch-built trie at query time. Viral terms appear in minutes without a full rebuild.
Shard by leading characters so each shard holds a slice of prefixes, replicate each shard read-only for QPS and availability, and rebuild offline then hot-swap onto the serving fleet. Read-only at serve time makes replication and swapping cheap.
Fuzzy matching via bounded edit distance, but only as a fallback when an exact prefix returns too few results, with a capped edit distance. Live fuzzy ranking on every keystroke would blow the latency budget, so I gate it.
Serve from in-memory tries (no DB on the path), precompute top-K so there's no query-time ranking, edge-cache the short popular prefixes that carry most traffic, debounce on the client, and keep the serving fleet geographically close. Latency is won by doing the work ahead of time.
A clean design with one of these undercurrents still scores below the bar at senior+. None are about getting an answer wrong — they're about how you operate.
Jumping to architecture without bounding the problem or confirming scale. Reads as template-matching.
"It depends" with no decision behind it. Name the trade-off, then pick.
Proposing a prefix scan + sort per keystroke. The fastest way to show you haven't designed for the latency constraint.
Sorting candidates on every keystroke instead of precomputing top-K. It cannot hold 100ms at scale.
A static daily-rebuilt trie with no path for trending terms — the interviewer will ask, and you should have volunteered it.
No observability, no rollout, no failure-mode plan. In 2026 this reads as "has never carried a pager."
Confident wrong answers when pushed. Far worse than an honest "here's what I'd verify."
Waiting to be asked the next question. At staff you own the 45 minutes.
Run a mock and score yourself honestly against the dimensions the interviewer uses. If you can't hit "strong" on depth and operability, that's your signal on where to drill.
| Dimension | Weak (downlevel) | Strong (at level) |
|---|---|---|
| Scoping | Treated it as generic search. | Named latency as the crux and prefix-matching as the shape; asked ranking and freshness. |
| Data structure | Vague or DB-scan based. | Trie with precomputed top-K per node; O(prefix) lookup served from memory. |
| Build vs serve | Mixed building into the query path. | Offline batch build + hot-swap; serving fleet only reads in-memory tries. |
| Freshness | Static daily rebuild only. | Streaming trending delta via count-min sketch merged with the batch trie. |
| Scaling | One giant trie on one node. | Sharded by leading chars, read-only replicas, edge-cached hot prefixes, client debounce. |
| Operability | Never mentioned it. | Suggestion p99, click-through, freshness lag, hot-swap rollback. |