Search-Indexing Playbook — Designing Typeahead / Autocomplete

01 How you're actually graded

Six buckets — and judgment outweighs the diagram.

Every FAANG company runs a rubric. The dimensions are roughly the same; the weights differ by company and level. At senior+ the boxes-and-arrows are table stakes — what gets graded hardest is the quality of your decisions: the questions you asked first, the trade-offs you surfaced and defended, and the production reality you volunteered without being asked.

Dimension	Weight	What earns the signal
Requirements & scoping	10–15%	You scoped before drawing, asked enough to bound the problem, pinned the scale number, and stated assumptions out loud.
High-level architecture	20–25%	The right components, a clear data flow, and a reason every box exists. The design satisfies each functional requirement.
Technical depth / deep dives	~30%	You go three questions deep on the hard part without being rescued. This is where staff is won or lost.
Trade-offs & judgment	highest effective	Two viable options, what each costs, and a committed pick for this system. Simplicity over flash when flash isn't warranted.
Communication / driving	cross-cutting	You drive the 45 minutes; the interviewer never has to rescue you. You narrate, checkpoint, and narrow when the design sprawls.
Operational maturity	↑ in 2026	The newest weight: observability, rollout, failure modes, on-call reality — volunteered, not pried out.

The 2026 shift, in one line. Operational concerns are now a first-class graded dimension, and "it depends" without a committed answer reads as evasion rather than nuance. Name the trade-off, then pick.

02 The same answer is scored differently at each level

It's a sliding scale, not a pass/fail bar.

A solid design with reasonable trade-offs is a strong score for a mid-level candidate and a downlevel flag for staff. The questions can be identical; the depth expectation is not. As you climb, the balance tips from breadth toward depth, proactivity, and production reality.

Mid-level

Meta E4 · Google L4 · Amazon SDE-II

breadthdepth

Knows autocomplete needs prefix matching and reaches for a trie when prompted.
Returns suggestions but may rank arbitrarily or compute ranking at query time.
Recognizes latency matters but doesn’t fully exploit precomputation.
Needs guidance on how the trie gets built and refreshed.

Senior

Meta E5 · Google L5 · Amazon SDE-III

breadthdepth

Names the trie with precomputed top-K per node unprompted and explains the O(prefix) lookup.
Separates the offline build (from query logs) from the online serving path; serves from memory.
Handles freshness with a near-real-time pipeline folded into a batch-built trie.
Discusses sharding the trie and edge-caching hot prefixes.

Staff+

Meta E6 · Google L6 · Amazon Principal

breadthdepth

Establishes the trie fast, then spends time on the freshness pipeline, sharding, and personalization.
Experience-backed take on top-K storage cost, hot-prefix caching, and rebuild-and-hot-swap.
Treats client debounce, fuzzy-matching cost bounds, and suggestion-CTR monitoring as routine.
Frames the build-vs-serve split and the cross-team seam with the ranking/logging pipeline.

03 The lens senior engineers narrate through

Borrow AWS's Well-Architected pillars as your trade-off vocabulary.

You don't recite AWS — you anchor each decision to one of these. It signals you evaluate systems across competing concerns rather than optimizing one axis. Each pillar below is mapped to a move you can make in this exact design.

PILLAR 01

Operational Excellence

Measure suggestion quality, not just latency.

Hook: “I’d track suggestion p99 against the 100ms budget and suggestion click-through — a fast suggestion nobody picks is still a failure.”

PILLAR 02

Security

Don’t leak private or unsafe terms.

Hook: “The suggestion corpus is filtered — no private queries, no banned terms — because autocomplete is a very public surface.”

PILLAR 03

Reliability

Serve stale before serving nothing.

Hook: “The trie is replicated read-only; if the freshness pipeline lags, I serve slightly stale suggestions rather than fail — stale completions are fine, missing ones aren’t.”

PILLAR 04

Performance Efficiency

Precompute so the query path does no ranking.

Hook: “Top-K is precomputed at each node, so a keystroke is an O(prefix) memory walk plus a copy — no sorting at request time.”

PILLAR 05

Cost Optimization

Cache the head of the distribution.

Hook: “A tiny set of short popular prefixes serves most traffic, so I edge-cache those and keep the full trie in memory only on the serving fleet.”

PILLAR 06

Sustainability

Rebuild, don’t thrash.

Hook: “The heavy trie rebuild runs offline on a schedule and hot-swaps in; only the small trending delta updates continuously.”

How to use it without sounding like a checklist. Don't list the pillars. Weave one in when you commit: name a trade-off, name the pillar it serves, and make the call. One sentence that does all three reads as senior.

03·5 The architecture you draw on the whiteboard

Two clocks: build the trie offline, serve it in under 100 ms.

Typeahead is a latency problem wearing a search costume: a new request on every keystroke, ranked suggestions back in under 100 ms. The core artifact is a trie with the top-K completions precomputed at every node, so a lookup is the length of the prefix — not a scan of the dictionary. Build it offline; serve it online.

Offline build Online serve Trie lookup

Two clocks. Query logs are aggregated and ranked into a trie with top-K precomputed at every node; the Suggestion Service just walks the prefix and returns the node’s list. Say it: “lookup cost is the length of the prefix, not the size of the dictionary.”

How to narrate it in the room

Reframe it as latency. “Typeahead is a latency problem in a search costume — the hard ceiling is ranked suggestions back in under ~100 ms on every keystroke.”
Name the structure. “A trie where each node stores its top-K completions precomputed. A lookup walks the prefix and returns that list — no scan, no per-request ranking.”
Split build from serve. “Offline, I aggregate query logs and rebuild the trie with fresh popularity; online, servers hold it in memory and answer reads. New snapshots ship periodically.”
Cache the head. “Hot prefixes sit in a CDN / edge cache, so the most common keystrokes never even reach the service.”

04 The interview, minute by minute

Five phases. Drive every one of them.

The simulation. Framing: search autocomplete — a dictionary of tens of millions of phrases, billions of queries/day, a suggestion request on every keystroke, and a hard p99 < 100ms end-to-end including network.

01Requirements & Scoping~6 min · don't draw yet

Grading this window: Do you name latency as THE constraint and prefix-matching (not full-text) as the shape? Asking about ranking and trending-freshness up front is the senior tell.

Functional requirements to land

As the user types a prefix, return the top-K ranked completions.
Suggestions update on every keystroke.
Reflect popularity / trending queries, not just static dictionary entries.

Non-functional requirements to land

Ultra-low latency: p99 < 100ms end to end — this is the defining constraint.
Read-heavy, tiny payloads, enormous QPS.
Freshness: trending terms should appear within minutes, not on the next daily rebuild.
High availability; serving stale suggestions beats failing.

▲ Allow — say this

“This is really a latency problem. Under 100ms including the round trip means I can’t do real work at query time — the ranking has to be precomputed and the lookup served from memory.”

▲ Allow — say this

“Two scoping questions: do we rank by popularity, and how fresh must trending be? If a viral query must show up in minutes, I need a streaming path on top of the batch-built index.”

▼ Reject — never say this

“We’ll query the database for terms starting with the prefix.” A LIKE 'rea%' scan can’t deliver ranked top-K under 100ms at this scale — it signals you haven’t indexed for prefix search.

02Entities, API & Estimation~5 min

Grading this window: The right data structure named early, and a sense of the read-heavy, tiny-payload, huge-QPS shape.

The “entity” here is the index itself. Interface is trivially small:

getSuggestions(prefix, k=10) → [ {term, score} … ]

The data structure is a trie: each node is a character; following a prefix walks down the tree. The key move — store the top-K completions at each node, precomputed from historical frequency. Then a lookup is “walk the prefix, return the node’s cached list” — O(prefix length), no ranking at request time.

The estimate that matters

Reads dwarf writes by orders of magnitude, and each response is a handful of short strings. So the design optimizes purely for read latency and memory layout; write/update of the index is an offline concern that must never touch the serving path.

▲ Allow — say this

“I’ll precompute the top-K completions at every trie node. That trades extra memory and build time for a query that does zero ranking work — exactly the trade the 100ms budget demands.”

03High-Level Design (the MVP)~13 min

Grading this window: Trie + precomputed top-K + an offline build pipeline. Right components, clear separation of build and serve.

The naive approach, and why it fails

Query a database for terms matching the prefix, sort by popularity, return the top 10. It works functionally — and it fails the only requirement that matters: a prefix scan plus a sort, per keystroke, at billions of queries a day, will never hold 100ms. Name that and move on.

The trie with precomputed top-K

Build a trie over the phrase corpus. At each node, store the top-K completions of that prefix, ranked by frequency. A request walks the prefix (a few pointer hops) and returns the node’s cached list — served entirely from memory on the serving fleet.

OFFLINE: query logs → aggregate frequencies → build trie + top-K per node → ship to serving fleet ONLINE: keystroke → walk prefix in in-memory trie → return cached top-K (no sort)

Building & updating the index

An offline batch pipeline aggregates query logs into term frequencies and rebuilds the trie periodically, then hot-swaps it onto the serving nodes. The serving path never blocks on a build.

The trap door the interviewer opens here. “How does a query that just went viral show up before tomorrow’s rebuild?” The batch trie alone is stale by up to a full rebuild cycle. The answer is a near-real-time path — a streaming aggregator (e.g., a count-min sketch for frequencies) that feeds a small trending delta merged with the batch trie at query time. Volunteering this is the senior signal.

▲ Allow — say this

“The heavy trie is built offline from query logs and hot-swapped in. The serving fleet only ever reads an in-memory structure — building and serving are completely separate paths so a rebuild can’t hurt latency.”

◆ Throttle — only with a reason

Recomputing top-K at query time “for freshness.” Defensible only for a tiny corpus — say so. At this scale it blows the latency budget; precompute and refresh the index instead.

▼ Reject — never say this

“Elasticsearch will handle it.” Naming a tool without the trie/top-K reasoning dodges the actual question — the interviewer wants the data structure and why it hits 100ms.

04Deep Dives — the stress test~15 min · where staff is decided

Grading this window: Lead toward freshness, sharding the trie, edge caching, and fuzzy matching. Staff volunteers these; 30%+ of the score.

Freshness: batch + streaming

Two paths feed suggestions. The batch pipeline rebuilds the authoritative trie from full query logs on a schedule. A streaming pipeline tracks rising query frequencies in near real time — a count-min sketch is a cheap, approximate frequency counter ideal for this — and produces a small trending delta merged with the batch results. Viral terms surface in minutes without rebuilding tens of millions of nodes.

Sharding & scaling the trie

The full trie is large. Shard by leading characters (e.g., first one or two letters route to a shard), replicate each shard read-only for QPS and availability, and rebuild offline then hot-swap. Because it’s read-only at serve time, replication is trivial.

Cutting the last milliseconds

Edge-cache the head of the distribution — a small set of short popular prefixes serves the majority of traffic and can live in a CDN/edge cache.
Client debounce: don’t fire on every keystroke; wait for a brief pause to cut request volume and noise.
Connection reuse and keeping the serving fleet geographically close to users.

Fuzzy matching (if asked)

Typos (“recieve” → “receive”) need edit-distance matching, which is expensive. Bound it: only attempt fuzzy expansion when an exact prefix yields too few results, and cap the edit distance. Treat it as a fallback, not the default path, to protect the budget.

▲ Allow — say this (staff move)

“I’d split freshness into a batch-built authoritative trie and a streaming trending delta using a count-min sketch. Most of the corpus is stable; only the rising tail needs to move fast, so I spend the streaming budget only where it pays off.”

▼ Reject — never say this

“We’ll rank everything live with a fuzzy search on each keystroke.” Live fuzzy ranking per keystroke is the fastest way to blow past 100ms.

Scripted stress-test exchange

Interviewer

A breaking-news query starts trending. When does it show up?

You

Not on the daily rebuild — too slow. A streaming aggregator watches query frequencies in near real time using a count-min sketch, and the rising terms form a small trending delta. At query time I merge the node’s batch top-K with that delta, so a surging term appears within minutes without touching the full trie.

Interviewer

Doesn’t merging at query time cost you latency?

You

The delta is tiny — a small per-prefix overlay — so the merge is a quick combine of two short lists, not a re-rank of the corpus. The 100ms budget survives because the expensive ranking still happened offline; I’m only blending in a handful of fresh candidates.

05Wrap-up — operability & recap~6 min

Grading this window: Prove you could run it. Volunteer observability and rollout; recap; name what you deferred.

Observability

Suggestion p99 against the 100ms budget, sliced by prefix length and region.
Suggestion click-through rate — the quality signal; fast but unhelpful suggestions are still failures.
Index freshness lag: time from a query trending to it appearing in suggestions.
Serving-fleet memory and cache hit rate on hot prefixes.

Rollout

Ship new tries by hot-swap with the old one kept warm for instant rollback. Canary ranking changes and watch click-through before fleet-wide rollout.

▲ Allow — say this

“With more time I’d add a personalization layer — blending a user’s own history into the suggestions — and deeper fuzzy matching. I scoped them out deliberately, not by accident.”

05 The follow-up gauntlet

The probes you'll get — and the answer that holds.

Interviewers push on latency and freshness. Commit to the trie, defend the precomputation, and protect the 100ms budget.

"Why not a SQL LIKE 'rea%' query?"

It can't deliver ranked top-K under 100ms at this scale — a prefix scan plus a sort per keystroke, billions of times a day, never holds the budget, and a leading-wildcard query can't even use a standard index well. A trie with precomputed top-K makes the lookup O(prefix) with no query-time ranking.

"How do you rank suggestions?"

By historical frequency aggregated from query logs, precomputed into the top-K at each trie node. I blend in recency/trending from a streaming path so rising terms aren't buried by all-time popularity. Crucially, the ranking is precomputed — the query path just reads.

"How does a trending query show up fast?"

A near-real-time streaming aggregator tracks rising frequencies — a count-min sketch is a cheap approximate counter for this — and produces a small trending delta merged with the batch-built trie at query time. Viral terms appear in minutes without a full rebuild.

"The trie is huge — how do you store and scale it?"

Shard by leading characters so each shard holds a slice of prefixes, replicate each shard read-only for QPS and availability, and rebuild offline then hot-swap onto the serving fleet. Read-only at serve time makes replication and swapping cheap.

"Typos — 'recieve' should suggest 'receive'."

Fuzzy matching via bounded edit distance, but only as a fallback when an exact prefix returns too few results, with a capped edit distance. Live fuzzy ranking on every keystroke would blow the latency budget, so I gate it.

"How do you actually hit sub-100ms?"

Serve from in-memory tries (no DB on the path), precompute top-K so there's no query-time ranking, edge-cache the short popular prefixes that carry most traffic, debounce on the client, and keep the serving fleet geographically close. Latency is won by doing the work ahead of time.

Handling a probe you can’t fully answer: reason from the budget. “I haven’t profiled the exact top-K value, but it’s a memory-vs-quality trade — larger K means richer suggestions and a bigger index. Here’s how I’d pick it from the latency and memory envelope.”

06 What gets you downleveled

The flags that quietly tank an otherwise solid loop.

A clean design with one of these undercurrents still scores below the bar at senior+. None are about getting an answer wrong — they're about how you operate.

Drawing before scoping

Jumping to architecture without bounding the problem or confirming scale. Reads as template-matching.

Hedging without committing

"It depends" with no decision behind it. Name the trade-off, then pick.

DB LIKE scan for suggestions

Proposing a prefix scan + sort per keystroke. The fastest way to show you haven't designed for the latency constraint.

Ranking at query time

Sorting candidates on every keystroke instead of precomputing top-K. It cannot hold 100ms at scale.

No freshness story

A static daily-rebuilt trie with no path for trending terms — the interviewer will ask, and you should have volunteered it.

Skipping operations entirely

No observability, no rollout, no failure-mode plan. In 2026 this reads as "has never carried a pager."

Bluffing under a probe

Confident wrong answers when pushed. Far worse than an honest "here's what I'd verify."

Not driving

Waiting to be asked the next question. At staff you own the 45 minutes.

07 Your pre-loop scorecard

Self-grade before you walk in.

Run a mock and score yourself honestly against the dimensions the interviewer uses. If you can't hit "strong" on depth and operability, that's your signal on where to drill.

Dimension	Weak (downlevel)	Strong (at level)
Scoping	Treated it as generic search.	Named latency as the crux and prefix-matching as the shape; asked ranking and freshness.
Data structure	Vague or DB-scan based.	Trie with precomputed top-K per node; O(prefix) lookup served from memory.
Build vs serve	Mixed building into the query path.	Offline batch build + hot-swap; serving fleet only reads in-memory tries.
Freshness	Static daily rebuild only.	Streaming trending delta via count-min sketch merged with the batch trie.
Scaling	One giant trie on one node.	Sharded by leading chars, read-only replicas, edge-cached hot prefixes, client debounce.
Operability	Never mentioned it.	Suggestion p99, click-through, freshness lag, hot-swap rollback.

The 60-second recap that lands the level

Quick recap: it's a latency problem, so I precompute — a trie with top-K completions cached at every node, served from memory for an O(prefix) lookup with no query-time ranking; an offline batch pipeline builds the authoritative trie from query logs and hot-swaps it in; a streaming count-min-sketch path surfaces trending terms in minutes; the trie is sharded by leading characters with read-only replicas and edge-cached hot prefixes; and the client debounces. Headline metrics: suggestion p99 and click-through. With more time: personalization and bounded fuzzy matching.

★

The one mental model: typeahead is won by moving all the expensive work — ranking — off the request path and into an offline build, so a keystroke is just a memory walk. Say “this is a precomputed-index latency problem” in the first two minutes, reach for the trie with top-K, and treat freshness as a small streaming overlay rather than a reason to compute live.

Design Autocomplete like the suggestions are already ahead of the keystroke.

Time Budget · how the 45 min should split

The shape of the problem

Six buckets — and judgment outweighs the diagram.

It's a sliding scale, not a pass/fail bar.

Borrow AWS's Well-Architected pillars as your trade-off vocabulary.

Operational Excellence

Security

Reliability

Performance Efficiency

Cost Optimization

Sustainability

Two clocks: build the trie offline, serve it in under 100 ms.

How to narrate it in the room

Five phases. Drive every one of them.

Functional requirements to land

Non-functional requirements to land

The estimate that matters

The naive approach, and why it fails

The trie with precomputed top-K

Building & updating the index

Freshness: batch + streaming

Sharding & scaling the trie

Cutting the last milliseconds

Fuzzy matching (if asked)

Observability

Rollout

The probes you'll get — and the answer that holds.

"Why not a SQL LIKE 'rea%' query?"

"How do you rank suggestions?"

"How does a trending query show up fast?"

"The trie is huge — how do you store and scale it?"

"Typos — 'recieve' should suggest 'receive'."

"How do you actually hit sub-100ms?"

The flags that quietly tank an otherwise solid loop.

Self-grade before you walk in.