Messages must arrive instantly, in order, without duplicates — and still get there when the recipient is offline. Layer on presence (“online now”) and group chat, and the scaling crux emerges fast: millions of concurrent persistent connections, and the puzzle of routing a message to whichever server happens to hold the recipient’s socket. Reliability here is built from one repeated idea: explicit confirmation at every hop.
Every FAANG company runs a rubric. The dimensions are roughly the same; the weights differ by company and level. At senior+ the boxes-and-arrows are table stakes — what gets graded hardest is the quality of your decisions: the questions you asked first, the trade-offs you surfaced and defended, and the production reality you volunteered without being asked.
| Dimension | Weight | What earns the signal |
|---|---|---|
| Requirements & scoping | 10–15% | You scoped before drawing, asked enough to bound the problem, pinned the scale number, and stated assumptions out loud. |
| High-level architecture | 20–25% | The right components, a clear data flow, and a reason every box exists. The design satisfies each functional requirement. |
| Technical depth / deep dives | ~30% | You go three questions deep on the hard part without being rescued. This is where staff is won or lost. |
| Trade-offs & judgment | highest effective | Two viable options, what each costs, and a committed pick for this system. Simplicity over flash when flash isn't warranted. |
| Communication / driving | cross-cutting | You drive the 45 minutes; the interviewer never has to rescue you. You narrate, checkpoint, and narrow when the design sprawls. |
| Operational maturity | ↑ in 2026 | The newest weight: observability, rollout, failure modes, on-call reality — volunteered, not pried out. |
A solid design with reasonable trade-offs is a strong score for a mid-level candidate and a downlevel flag for staff. The questions can be identical; the depth expectation is not. As you climb, the balance tips from breadth toward depth, proactivity, and production reality.
You don't recite AWS — you anchor each decision to one of these. It signals you evaluate systems across competing concerns rather than optimizing one axis. Each pillar below is mapped to a move you can make in this exact design.
Confirm at every hop.
End-to-end by default.
Never lose a message.
Push, don’t poll.
Right path per payload.
Bound presence chatter.
Messages must arrive instantly, in order, without duplicates — and still get there when the recipient is offline. The scaling crux: millions of concurrent persistent connections, and the puzzle of routing a message to whichever server happens to hold the recipient’s socket. Reliability is built from one repeated idea: explicit confirmation at every hop.
The simulation. Framing: a global messenger — ~2B users, millions of concurrent connections, delivery < 100ms when both are online, strict per-conversation ordering with no duplicates, reliable offline delivery, and presence that’s allowed to be slightly stale.
“The defining constraint is connection scale: tens of millions of persistent sockets, which means the hard problem is routing a message to the specific server holding the recipient’s connection. I’ll use WebSockets — polling wastes bandwidth and can’t push.”
“I’ll keep presence off the message path. Heartbeats from every client are extremely high-frequency with weak durability needs — mixing them into message delivery would pollute the critical path.”
“Clients poll the server every couple of seconds for new messages.” Polling wastes bandwidth at billions of clients and adds latency — it’s the protocol you reject, not propose.
Design WhatsApp.
Core flow is send, receive, receipts, offline, presence, groups. The thing that makes it hard isn’t any one feature — it’s holding millions of persistent connections and routing a message to whichever server has the recipient’s socket. I’ll use WebSockets for the live path, a registry to track user-to-server, and a push fallback for offline. Presence I’ll handle separately since it’s high-frequency and low-durability.
Good. Assume tens of millions of concurrent users.
Then cross-server routing is the centerpiece. Let me build the 1:1 path with explicit ACKs first, then offline, then groups.
Entities: Message (id = time-ordered UUID, conversationId, senderId, ts), Connection registry (userId → chat-server), Conversation. The registry is the keystone:
With tens of millions of concurrent connections and each chat server holding on the order of ~100K–1M sockets, you need hundreds of stateful chat servers — which is exactly why a sender and recipient are usually on different servers, making cross-server routing the central problem.
“The message id is a time-ordered UUID, which gives me two things for free: per-conversation ordering and a stable key for client-side dedup, so an at-least-once retry never shows a duplicate bubble.”
A holds a WebSocket to Chat Server 1. On send: S1 persists the message (Cassandra) before ACKing A, then routes it — it looks up B in the registry. If B is online on Chat Server 2, S1 forwards to S2 (directly or via a pub/sub backbone), S2 delivers over B’s socket, and a delivered receipt flows back to A. If B is offline, the message sits in B’s durable inbox and a push notification (APNs/FCM) wakes the device, which fetches missed messages on reconnect.
“I persist the message before I ACK the sender. That ordering matters — if I ACK first and crash, the sender thinks it’s delivered and it’s gone. Persist-then-ack is how I guarantee no silent loss.”
A single message broker every server publishes to. Fine conceptually, but call out the routing cost — you still need the registry (or topic-per-user) so a message reaches the one server holding the recipient, not all of them.
“All users connect to one server.” One server can’t hold tens of millions of sockets — the whole problem is that connections are spread across hundreds of stateful servers.
Chat servers are stateful — each holds hundreds of thousands of live sockets. The registry (user→server) must update on every connect/disconnect; a pub/sub backbone lets any server route to any other without knowing topology. A server dying drops its connections; clients reconnect (to a new server, updating the registry) and fetch anything missed.
Aim for at-least-once delivery with client-side dedup on the message id — true exactly-once over a flaky network is impractical, so dedup makes retries safe. Ordering is per-conversation via the time-ordered id / sequence number, not global.
Presence is a heartbeat with a short TTL in Redis (e.g. ~30s); no heartbeat → offline. Critically, push presence updates only to users actively viewing that contact’s chat — broadcasting every user’s status to everyone would be a notification storm at 2B users.
Media uploads via presigned storage URLs, bypassing chat servers entirely. E2E encryption (Signal protocol) means the server routes ciphertext — and the per-recipient key encryption is exactly why very large E2E groups are hard.
“Group fan-out is tiered: a loop for small groups, async Kafka fan-out for large ones, and a pull model with aggregated receipts for massive channels — because pushing every message and receipt to a hundred-thousand-member group synchronously would melt the sender’s path.”
“For a huge group we just loop over all members and push.” A synchronous loop over a hundred thousand members blocks the send and creates a fan-out storm — the exact thing the tiered approach avoids.
A and B are on different chat servers. Trace the message.
A’s server persists the message and ACKs A, then looks up B in the connection registry — say B is on Server 2. It forwards the message over the pub/sub backbone to S2, which pushes it down B’s socket and sends a delivered receipt back to A. If the registry says B is offline, the message stays in B’s durable inbox and I fire a push via APNs/FCM; B fetches it on reconnect. The registry is what makes cross-server routing work.
B’s server crashes right as the message arrives.
The message was already persisted before A was ACKed, so it’s not lost. B’s client detects the dropped socket, reconnects — to a different server, which updates the registry — and fetches anything undelivered from its inbox. Delivery is at-least-once, so if a duplicate slips through, the client dedups on the message id and B never sees two bubbles.
Drain connections gracefully on deploy so clients reconnect to healthy servers; canary protocol changes on a small connection slice. Keep the offline-fetch path robust — it’s the safety net during any disruption.
“With more time I’d detail multi-device sync, voice/video signaling, and the full Signal-protocol key management. I scoped them out deliberately — each is its own subsystem.”
Interviewers push on cross-server routing, delivery guarantees, and group scale. Name the registry, defend persist-before-ack, tier the fan-out.
A connection registry maps each user to the server holding their socket. A's server looks up B, finds B's server, and forwards the message — directly or over a pub/sub backbone that lets any server route to any other. Without the registry, you'd have to broadcast to every server, which doesn't scale.
WebSocket. It's a persistent bidirectional channel, so the server can push instantly with sub-100ms latency. Polling wastes bandwidth at billions of clients and adds up to the poll interval of latency on every message — it's the option I reject.
Per-conversation ordering via a time-ordered message id or sequence number — not a global order, which isn't needed. Delivery is at-least-once, so the client dedups on the message id; a retry can't produce a duplicate bubble. That combination is practical where true exactly-once over a flaky network isn't.
The message is persisted to a durable inbox before I ACK the sender, and a push notification via APNs/FCM wakes the device. On reconnect the client fetches all missed messages. Persist-before-ack is what guarantees the offline message isn't lost.
Not a synchronous loop. For large groups I fan out asynchronously via Kafka partitioned by group so the sender isn't blocked; for truly massive channels I switch to a pull model — store once, members fetch on open — and aggregate read receipts instead of emitting one per member to avoid a receipt storm.
Heartbeats with a short TTL in Redis — no heartbeat means offline — kept entirely off the message path. And I only push presence updates to users actively viewing that contact's chat; broadcasting everyone's status to everyone would be a constant storm.
A clean design with one of these undercurrents still scores below the bar at senior+. None are about getting an answer wrong — they're about how you operate.
Jumping to architecture without bounding the problem or confirming scale. Reads as template-matching.
"It depends" with no decision behind it. Name the trade-off, then pick.
Proposing periodic polling for a real-time messenger — wasteful and laggy at scale. WebSocket with push is the expected answer.
Ignoring that millions of sockets span hundreds of stateful servers, which makes cross-server routing the core problem.
Acknowledging the sender before durably storing the message — a crash then silently loses a message the user saw confirmed.
No observability, no rollout, no failure-mode plan. In 2026 this reads as "has never carried a pager."
Confident wrong answers when pushed. Far worse than an honest "here's what I'd verify."
Waiting to be asked the next question. At staff you own the 45 minutes.
Run a mock and score yourself honestly against the dimensions the interviewer uses. If you can't hit "strong" on depth and operability, that's your signal on where to drill.
| Dimension | Weak (downlevel) | Strong (at level) |
|---|---|---|
| Scoping | Started drawing sockets; skipped the scaling crux. | Named connection scale + cross-server routing as the crux; chose WebSocket with a reason; separated presence. |
| Routing | Hand-waved 'servers talk to each other'. | Connection registry (user→server) plus a pub/sub backbone for any-to-any routing. |
| Delivery semantics | No ordering or dedup story. | Persist-before-ack, per-conversation ordering, at-least-once + client dedup on message id. |
| Offline | Forgot offline users. | Durable inbox + APNs/FCM push, fetch-missed on reconnect. |
| Group fan-out | One loop for all group sizes. | Tiered: loop for small, async Kafka for large, pull + aggregated receipts for massive. |
| Operability | Never mentioned it. | Connection-count and delivery-latency SLOs, graceful connection draining, presence lag. |