Pattern-first · 9 clusters · built from the 64-question canon

You can't memorize 64 systems. You can memorize 9 patterns.

Every "Design X" question is a costume. Underneath, the same handful of problems repeat: fan-out, geospatial lookup, collaborative state, strong-consistency contention, search, streaming aggregation. Learn to see the pattern under the prompt and a question you've never seen becomes one you've already solved. This atlas gives you the patterns, the questions each one unlocks, the metrics to quote, and the exact sentences that make you sound like you've run these in production.

9
Reusable patterns
64
Questions mapped
1
Method · "SCALE-DO"
Variations they collapse to
0 The universal method — runs on every question

Same six phases, every time. Memorize the cadence, not the systems.

Before the patterns, the rhythm. Whatever the prompt, you walk these phases in this order. The pattern (next sections) just tells you what to say inside each phase. The phases never change — which is exactly why they're worth burning into muscle memory.

01

Scope

Functional + non-functional. Pin the scale number. State what's out of scope.

~5 min
02

Calculate

Back-of-envelope: QPS, storage, bandwidth. Reads:writes ratio drives everything.

~3 min
03

API + data

Define the interface and core entities. Pick the storage shape.

~4 min
04

Lay out HLD

Boxes, arrows, data flow. Satisfy each functional requirement once.

~12 min
05

Deep dive

Attack the bottleneck the pattern predicts. Trade-offs, two options, commit.

~15 min
06

Operate

Bottlenecks, failure modes, observability, rollout. Volunteer it.

~6 min
SScope it
CCalculate scale
AAPI & data
LLay out design
EExamine bottleneck
DODefend & Operate
The one habit that signals seniority: say the pattern name out loud in phase 1. "This is fundamentally a fan-out problem" or "the crux here is strong consistency under contention" tells the interviewer you've recognized the shape in the first two minutes. That single sentence reframes you from someone solving a puzzle to someone who's seen the puzzle's family before.
The mind map — the whole canon on one page

One root, nine branches, every question that hangs off them.

This is the picture to burn into memory. When a prompt lands, you're not searching 64 systems — you're walking one tree: from the root, pick the branch whose tell matches, and the leaf you were asked about is sitting on a branch you've already studied. The highlighted leaf on each branch is the worked chapter in this book — click it to open the full playbook.

Read it as: root question → 9 pattern branches (color-coded) → the questions each branch unlocks. Coloured leaf = a full chapter you can open.
Every
“Design X”
questionfind the branch
01
Fan-out & Feedspush / pull / hybrid
Twitter / timeline Instagram feedFacebook NewsfeedRedditLinkedIn feedTikTok For-YouNotification systemActivity feed
02
Geospatial & Proximitygeohash / quadtree / S2
Uber / Lyft Yelp / NearbyNearby FriendsFood-delivery dispatchTinder geo-matchMaps “near me”Find My Device
03
Collaborative StateOT / CRDT · convergence
Google Docs FigmaMiro whiteboardNotionCollaborative code editorTrello board
04
Strong Consistencylocking · contention
Ticketmaster Flash salePayments / StripeHotel / Airbnb bookingStock exchangeE-commerce inventoryDistributed lock
05
Search & Indexingtrie · inverted index
Typeahead / autocomplete Twitter searchGoogle SearchGmail searchWeb crawlerAmazon product searchLog search (ELK)
06
Streaming Aggregationdedup · windows · counting
Ad click aggregator View / like counterTop-K / trendingMetrics (Datadog)Google Ads analyticsReal-time analytics
07
Large Blobs & Mediaobject store · CDN · transcode
YouTube / Netflix Dropbox / DriveS3 / object storeInstagram mediaGoogle PhotosLive streaming
08
Realtime Messagingsockets · presence · fan-in
WhatsApp / Messenger DiscordSlackTwitch chatPush notificationsOnline presenceTyping indicators
09
Scheduling & Async Jobsat-least-once + idempotency
Distributed job scheduler Reminder / alertsDistributed cronCode deploymentDistributed lock (Chubby)AWS LambdaWebhook / retry queue
How to use it under pressure: say the branch out loud before you draw a single box — “This is fundamentally a fan-out problem,” or “the crux is strong consistency under contention.” Naming the branch in the first two minutes is the single clearest seniority signal you can send, and it commits you to the right deep-dive before the clock starts working against you.
1–9 The nine clusters

Each pattern is a lens. Find it, and the design writes itself.

For each cluster: the tell (how to recognize it), the questions it unlocks, the clarifying questions to ask, the core moves, the metrics to quote, a line that lands, and the trap to avoid. Color-coded so you can find your way back fast.

01Fan-out & Feedspush / pull / hybrid · write vs read amplification
The tell: one user's action must reach many followers' views — or many sources must aggregate into one view. The whole interview pivots on when you materialize the timeline: at write time (push), at read time (pull), or hybrid. If "feed," "timeline," "followers," or "newsfeed" appears, you're here.
Unlocks these questions
Twitter / timelineInstagramFacebook NewsfeedRedditNotification systemLinkedIn feed
Ask first
read:write ratio?celebrity accounts?ranked or chronological?feed staleness tolerance?
Core moves
  • Push (fan-out on write): precompute each follower's feed on post. Fast reads, heavy writes — dies on celebrities (one post → millions of writes).
  • Pull (fan-out on read): assemble feed at read time. Cheap writes, expensive reads. Fine for inactive users.
  • Hybrid: push for normal users, pull for celebrity follows merged at read. The senior answer almost every time.
  • Ranked feeds add a feature store + ML scoring layer over the candidate set.
Metrics to quote
feed gen p99 < 200ms fan-out writes/post read:write ~100:1 cache hit > 95%
Stack you'll name
  • Redis for materialized feeds; Cassandra/sharded SQL for posts.
  • Kafka to drive the fan-out workers asynchronously.
  • CDN + cache for media and hot feeds.
The celebrity problem breaks pure push — one post becomes millions of writes. So I'd go hybrid: fan out on write for ordinary accounts, and for the handful of celebrity follows, pull their recent posts at read time and merge. It bounds write amplification without making the common-case read expensive.
The trap: committing to pure push without naming the celebrity blow-up, or pure pull without acknowledging read cost at scale. The interviewer is waiting for "hybrid, and here's why" — get there yourself.
02Geospatial & Proximitygeohash · quadtree · S2 / H3 · nearest-neighbor
The tell: "find things near me," "match riders to drivers," "what's around this location." The instant you see location-based lookup, the answer is a spatial index — never a naive distance scan over every row.
Unlocks these questions
Uber / LyftYelp / NearbyNearby FriendsFood delivery dispatchTinder geo-match
Ask first
search radius?static or moving entities?update frequency?match or just list?
Core moves
  • Geohash — encode lat/long into a string prefix; nearby points share prefixes. Easy, good default.
  • Quadtree — recursively subdivide dense regions; adapts to population density.
  • S2 / H3 — Google/Uber's cell systems; better for the sphere and uniform cells.
  • Moving entities (Uber): drivers publish location to an in-memory geo-index (Redis GEO) every few seconds; matching reads the relevant cells.
Metrics to quote
match < 1–2slocation ping ~4squery p99 < 100ms
Stack you'll name
  • Redis with geo commands for live driver positions.
  • Sharded by geo-cell so hot cities don't overload one node.
  • Separate write path (pings) from read path (matching).
I'll index drivers by S2 cell in an in-memory store. When a rider requests, I compute their cell plus neighbors and query just those — turning a global scan into a handful of cell lookups. Drivers re-publish position every few seconds, so the index is eventually fresh, which is fine for dispatch.
The trap: proposing a SQL WHERE distance < r over all rows. It signals you've never indexed spatial data. Name geohash or S2 in the first sentence.
03Collaborative StateOT · CRDT · conflict resolution · low-latency sync
The tell: multiple users edit the same document/canvas at the same time and must converge to one consistent state. The make-or-break is the conflict-resolution algorithm — bluffing OT or CRDT is instantly obvious.
Unlocks these questions
Google DocsMiro whiteboardFigmaNotionCollaborative code editor
Ask first
text or spatial objects?how many concurrent editors?offline editing?history / undo?
Core moves
  • Operational Transform (OT): transform concurrent ops against each other; central server orders them. Powers Google Docs; tricky to implement.
  • CRDTs: data types that mathematically converge without a central authority; better for offline and P2P, heavier metadata.
  • WebSockets for the bidirectional low-latency channel; one document = one session/room.
  • Persist op log; snapshot periodically so reload doesn't replay all history.
Metrics to quote
edit echo < 100msconvergence guaranteedops/doc/s bounded
Decision frame
  • OT = simpler data, central server, mature.
  • CRDT = offline-friendly, decentralized, more memory.
  • Spatial canvas (Miro) leans CRDT; linear text leans OT.
For linear text with a central server I'd use OT — transform concurrent operations so every client converges on the same sequence. If offline editing or a spatial object model were a hard requirement, I'd switch to CRDTs and accept the extra per-object metadata for guaranteed convergence without a coordinator.
The trap: saying "last write wins." For collaborative editing that silently destroys other users' work. Show you know OT/CRDT exist, even at a mental-model level.
04Strong Consistency & Contentionlocking · idempotency · double-entry · exactly-once
The tell: two parties must not get the same scarce thing, or money/inventory must never be wrong. This is the one cluster where you sacrifice availability for consistency on purpose. Keywords: book, buy, pay, reserve, seat, inventory.
Unlocks these questions
TicketmasterFlash salePayment system / StripeHotel/Airbnb bookingStock exchangeShopping cart
Ask first
contention level?hold/reserve window?money involved?retries expected?
Core moves
  • Optimistic locking (version/CAS) for low contention; pessimistic / row locks or a distributed lock for hot rows.
  • Reservation with TTL: tentatively hold the seat/item, confirm on payment, auto-release on timeout.
  • Idempotency keys so a retried payment charges once — non-negotiable for money.
  • Double-entry bookkeeping + reconciliation; ledgers are append-only and balance.
  • Saga for multi-service transactions instead of a distributed 2PC.
Metrics to quote
double-book = 0money error = 0hold TTL ~10 min
The CAP stance
  • Pick CP: reject under partition rather than risk a double sale.
  • Flash sale: gate with a distributed counter / queue before the DB.
  • Name the cost out loud: lower availability, higher latency.
Money has zero tolerance for "eventually consistent," so I'll make writes idempotent with a client-supplied key, record every movement as a double-entry ledger row, and reconcile asynchronously. For seat selection I'll hold the seat with a short TTL and confirm on payment — that's CP by choice, and I accept the lower availability that comes with it.
The trap: hand-waving "we'll use a transaction" without addressing the hot-row / distributed case, or forgetting idempotency on payments. That's the exact gap interviewers drill here.
05Search & Indexinginverted index · trie · ranking · TF-IDF / BM25
The tell: "search," "autocomplete," "find documents matching." The core artifact is an inverted index (term → postings list) or a trie for prefixes. Ranking and freshness are the depth.
Unlocks these questions
Typeahead / autocompleteTwitter searchGoogle SearchGmail searchWeb crawlerProduct search
Ask first
prefix or full-text?freshness requirement?personalized ranking?fuzzy / typos?
Core moves
  • Inverted index: term → list of doc IDs; shard by term or by document, each with trade-offs.
  • Trie for autocomplete; precompute top-K per prefix so reads are O(prefix).
  • Ranking: TF-IDF / BM25 baseline, then a learned ranker; freshness via a small real-time index merged with the big batch index.
  • Crawler front-end: URL frontier queue + dedup (Bloom filter) + politeness budget.
Metrics to quote
typeahead < 100msindex lag secondsrecall/precision tracked
Stack you'll name
  • Elasticsearch / Lucene for general full-text.
  • Two-tier index: real-time + batch, merged at query time.
  • Bloom filter for URL/term dedup at scale.
Autocomplete is a latency problem more than a search problem — under 100ms end to end. So I precompute the top-K completions per prefix in a trie and serve from memory; trending queries get folded in via a near-real-time pipeline rather than rebuilding the whole structure.
The trap: proposing a SQL LIKE '%term%' scan. It can't use an index and won't scale — reach for the inverted index or trie immediately.
06Streaming Aggregation & CountingKafka · windowing · exactly-once · sharded counters
The tell: a firehose of events must be counted, aggregated, or deduplicated in near-real-time — likes, clicks, views, metrics. Think in pipelines, not request-response. Late and duplicate data are the hard parts.
Unlocks these questions
Ad click aggregatorYouTube likes counterTop-K / trendingMetrics (Datadog)Google AdsAnalytics
Ask first
exact or approximate?real-time or batch?dedup window?cardinality?
Core moves
  • Ingest into Kafka; process with a stream engine (Flink) over time windows.
  • Sharded / approximate counters to avoid a single hot row; reconcile to a precise count off the event log.
  • Probabilistic structures: HyperLogLog (unique counts), Count-Min Sketch (frequencies), Bloom filter (membership) when approximate is acceptable.
  • Exactly-once via idempotent writes + dedup keys; handle late-arriving events with watermarks.
Metrics to quote
ingest 1M+ ev/sfreshness secondscount drift < 0.1%
The lambda choice
  • Fast approximate path for live numbers.
  • Slow exact path (batch over the log) for correctness.
  • Reconcile; serve the corrected value when ready.
Counting a million likes a second on one video creates a hot row that takes down the DB. I'd shard the counter across N keys and sum on read, ingest through Kafka, and keep the event log as the source of truth so a batch job can correct any drift. If exact unique-viewer counts were needed I'd reach for HyperLogLog.
The trap: a single UPDATE counter SET n = n+1 row. That hot row is the whole problem — sharded counters or a stream aggregate is the expected answer.
07Large Blobs & Mediachunking · dedup · CDN · transcoding · ABR
The tell: the payload is big and binary — files, photos, video. Storage tiers, the CDN, and (for video) transcoding dominate. Often "the client matters as much as the server."
Unlocks these questions
Dropbox / DriveYouTube / NetflixS3Instagram mediaPhoto storage
Ask first
file sizes?sync or just store?streaming or download?durability target?
Core moves
  • Chunk large files; dedup by content hash; delta sync only changed chunks (Dropbox).
  • Blobs in object storage (S3), metadata in a DB — never blobs in the relational DB.
  • Video: transcode into multiple bitrates; serve via adaptive bitrate streaming; push to CDN edges.
  • Durability via replication or erasure coding (S3-style) across regions.
Metrics to quote
durability 11 ninesstart delay < 2scache hit > 90%
Stack you'll name
  • Object store + metadata DB + CDN, always.
  • Transcoding pipeline as async workers off a queue.
  • Presigned URLs for direct client upload/download.
Video is a bandwidth problem dressed up as a software problem. On upload I transcode to several bitrates asynchronously, store segments in object storage, and serve adaptive bitrate through a CDN so the client steps down quality on a weak network instead of buffering. Metadata lives in a DB; the bytes never touch it.
The trap: storing file bytes in your primary database, or forgetting the CDN. Both signal you haven't built media at scale.
08Realtime Messaging & PresenceWebSockets · pub/sub · delivery semantics · ordering
The tell: messages must arrive instantly, in order, without duplicates, even when the recipient is offline — plus presence ("online now"). Connection management at millions of concurrent sockets is the scaling crux.
Unlocks these questions
WhatsApp / MessengerDiscordTwitch chat / live commentsSlackPush notifications
Ask first
1:1, group, or broadcast?delivery/read receipts?E2E encryption?channel size?
Core moves
  • WebSockets for live connections; a connection-manager layer tracks which server holds each user's socket.
  • Pub/sub backbone routes a message to the recipient's connection server.
  • Offline: persist to a per-user inbox; deliver on reconnect with sequence numbers for ordering + dedup.
  • Massive channels (Twitch, Discord): hierarchical fan-out + accept lossy delivery — perfect delivery to a million passive viewers is neither possible nor needed.
Metrics to quote
delivery < 100msconns/node ~1Mordering per-chat
Decision frame
  • 1:1 → reliable, ordered, persisted.
  • Huge broadcast → fan-out + lossy is the right call.
  • Presence via heartbeats with TTL.
For a million-viewer chat, the insight is that perfect delivery isn't the goal — readability is. I'd fan messages out hierarchically and drop some under load rather than guarantee every viewer sees every line. For 1:1 messaging I flip entirely: persist to an inbox, sequence-number for ordering, and dedup on reconnect.
The trap: treating a million-viewer broadcast with the same reliable-ordered guarantees as 1:1 chat. Knowing when lossy is correct is the senior signal.
09Scheduling & Async Jobstiming wheels · queues · exactly-once · coordination
The tell: something must happen later, reliably, possibly billions of times, exactly once, surviving crashes. Reminders, deploys, cron, distributed locks all live here. The hard part is exactly-once under failure.
Unlocks these questions
Reminder / alert systemDistributed cronCode deploymentDistributed lock (Chubby)Job schedulerLambda
Ask first
scale of jobs?timing precision?exactly vs at-least once?recurring?
Core moves
  • Timing wheel / priority queue for near-term fires; durable store for far-future jobs, pulled in as their time approaches.
  • Queue + workers: dispatch due jobs to a queue; workers ack on completion; visibility timeout requeues on crash.
  • Exactly-once via idempotent execution + a dedup/lease so two workers can't both run a job.
  • Coordination (leader election, locks) via ZooKeeper/etcd/Raft; use fencing tokens.
Metrics to quote
fire skew < 1sjobs 1B+ scheduledexec exactly-once
Failure stance
  • Assume workers die mid-job; design for safe retry.
  • Idempotency makes at-least-once behave like exactly-once.
  • Dead-letter queue for poison jobs.
A naive cron is trivial; a distributed one that fires each job exactly once through machine failures is the real problem. I'd store schedules durably, pull due jobs into a timing wheel, dispatch via a queue with visibility timeouts so a crashed worker's job is retried, and make execution idempotent so the retry is safe.
The trap: claiming true exactly-once delivery. The honest, senior answer is at-least-once + idempotency = effectively-once. Say it that way.
The reverse lookup

Question → pattern. When the prompt lands, find the lens here.

The 64-question canon, mapped to its dominant pattern and difficulty tier. In the interview you do the same map in your head: hear the prompt, name the pattern, run the method. This table is just that reflex, written down.

T1 Warm-up T2 Classic T3 Modern T4 Heavy T5 Case study
QuestionTierDominant patternCrux in one line
TinyURL / PastebinT1Search / KVID encoding, read amplification, when to cache.
API / distributed rate limiterT1ConsistencyAtomic counter across nodes; token bucket; fail-open vs closed.
Unique ID generatorT1ConsistencySnowflake; clock skew; time-ordering vs uniqueness.
Typeahead / autocompleteT1SearchTrie, precomputed top-K, <100ms budget.
API gatewayT1ConsistencyAuth, routing, throttling at the edge.
Twitter / timelineT2Fan-outPush vs pull vs hybrid; the celebrity problem.
InstagramT2Fan-out + MediaFeed fan-out plus image/video storage tiers + CDN.
Facebook NewsfeedT2Fan-out (ranked)Feature store + ML scoring over candidates.
RedditT2Fan-outThreaded comments + time-decay "hot" ranking.
Messenger / WhatsAppT2Realtime msgDelivery semantics, ordering, offline inbox, E2E.
DropboxT2Blobs / MediaChunking, dedup, delta sync, conflict resolution.
Yelp / Nearby FriendsT2GeospatialGeohash / quadtree / S2 indexing.
Uber / LyftT2Geospatial + matchLive geo-index + real-time dispatch + surge.
Web crawlerT2SearchURL frontier, dedup (Bloom), politeness budget.
Twitter / Google SearchT2SearchInverted index, sharding, ranking, freshness.
TicketmasterT2ConsistencyNo double-booking; distributed lock / hold TTL.
Google CalendarT2ConsistencyRecurring events, time zones, invite propagation.
Discord / Twitch chatT3Realtime msgHierarchical fan-out; lossy delivery at scale.
Google Docs / MiroT3CollaborationOT vs CRDT; sub-100ms sync; convergence.
ChatGPTT3Streaming + servingGPU scheduling, KV-cache reuse, token streaming.
Notification systemT3Fan-outMulti-channel, preferences, retries, provider failover.
Netflix recommendationsT3Streaming / MLCandidate gen → features → serving → A/B.
GmailT3SearchSearch-over-personal-data, threading, spam.
Google News aggregatorT3StreamingCrawl → dedup → cluster → rank, continuously.
LeetCode judgeT3Async jobsSandboxing, isolation, queue dispatch, caching.
Code deploymentT3Async jobsBlue-green, canary, rollback orchestration.
Metrics / DatadogT3StreamingTSDB, columnar storage, hierarchical aggregation.
LinkedIn / People You May KnowT3Fan-out / graphGraph hops at billion-scale; offline precompute.
AirbnbT3ConsistencyTwo-sided marketplace; availability + booking.
Reminder alert systemT3SchedulingTiming wheels; billions of future tasks; crash safety.
YouTube / Netflix (video)T4MediaTranscoding, adaptive bitrate, CDN economics.
Distributed cache (Redis)T4Consistency / KVSharding, eviction, stampede, hot keys.
Key-value store (DynamoDB)T4ConsistencyQuorum R/W, vector clocks, hinted handoff.
Amazon S3T4BlobsErasure coding, multi-region, read-after-write.
Payment system / StripeT4ConsistencyIdempotency, double-entry, reconciliation.
Flash saleT4ConsistencyFairness + inventory under extreme contention.
Google Ads / click aggregatorT4StreamingReal-time auction + exactly-once click counting.
Stock exchangeT4ConsistencyMicrosecond latency, deterministic matching.
YouTube likes counterT4Streaming / countingSharded counters; no hot row.
Distributed lock / job scheduler / cronT4SchedulingExactly-once, leader election, fencing tokens.
Dynamo / Cassandra / Kafka / Chubby / GFS / HDFS / BigTableT5Foundational papersRead as papers: consensus, logs, consistent hashing, chunked storage.
How to actually memorize this: don't drill all 64. Pick one question per pattern, whiteboard it cold, then read the canonical solution and note only what you missed. After nine problems — one per cluster — new prompts start feeling familiar before you finish reading them. That's the patterns clicking into place, and it's the entire point of this atlas.