Pattern 08 · Realtime Messaging & Presence · Staff-Level Simulation

Design WhatsApp like ten million sockets are already open.

Messages must arrive instantly, in order, without duplicates — and still get there when the recipient is offline. Layer on presence (“online now”) and group chat, and the scaling crux emerges fast: millions of concurrent persistent connections, and the puzzle of routing a message to whichever server happens to hold the recipient’s socket. Reliability here is built from one repeated idea: explicit confirmation at every hop.

Time Budget · how the 45 min should split

45:00total  /  you must drive every minute
Scope
~6m
API
~5m
HLD
~13m
Deep dives
~15m
Wrap
~6m

The shape of the problem

user A chat S1 chat S2 registryuser→server user B ✓✓ delivered offline?APNs/FCM
AllowSay this — it earns the signal interviewers grade.
ThrottleDefensible, but only with a stated reason.
RejectNever say it. Each one is a downlevel flag.
01 How you're actually graded

Six buckets — and judgment outweighs the diagram.

Every FAANG company runs a rubric. The dimensions are roughly the same; the weights differ by company and level. At senior+ the boxes-and-arrows are table stakes — what gets graded hardest is the quality of your decisions: the questions you asked first, the trade-offs you surfaced and defended, and the production reality you volunteered without being asked.

DimensionWeightWhat earns the signal
Requirements & scoping10–15%You scoped before drawing, asked enough to bound the problem, pinned the scale number, and stated assumptions out loud.
High-level architecture20–25%The right components, a clear data flow, and a reason every box exists. The design satisfies each functional requirement.
Technical depth / deep dives~30%You go three questions deep on the hard part without being rescued. This is where staff is won or lost.
Trade-offs & judgmenthighest effectiveTwo viable options, what each costs, and a committed pick for this system. Simplicity over flash when flash isn't warranted.
Communication / drivingcross-cuttingYou drive the 45 minutes; the interviewer never has to rescue you. You narrate, checkpoint, and narrow when the design sprawls.
Operational maturity↑ in 2026The newest weight: observability, rollout, failure modes, on-call reality — volunteered, not pried out.
!
The 2026 shift, in one line. Operational concerns are now a first-class graded dimension, and "it depends" without a committed answer reads as evasion rather than nuance. Name the trade-off, then pick.
02 The same answer is scored differently at each level

It's a sliding scale, not a pass/fail bar.

A solid design with reasonable trade-offs is a strong score for a mid-level candidate and a downlevel flag for staff. The questions can be identical; the depth expectation is not. As you climb, the balance tips from breadth toward depth, proactivity, and production reality.

Mid-level
Meta E4 · Google L4 · Amazon SDE-II
80
20
breadthdepth
  • Draws clients on WebSockets and messages broadcast through a server.
  • Knows offline users need push but may not route across servers cleanly.
  • Stores messages but is shaky on ordering and dedup guarantees.
  • Needs guidance toward the connection-registry routing problem.
Senior
Meta E5 · Google L5 · Amazon SDE-III
60
40
breadthdepth
  • Chooses WebSocket with a one-line reason and names the connection registry (user→server) unprompted.
  • Persists before ACK, dedups on a message id, and orders per-conversation.
  • Designs the offline inbox + push path and a heartbeat-based presence service kept off the message path.
  • Has a group-fan-out story that scales with group size.
Staff+
Meta E6 · Google L6 · Amazon Principal
40
60
breadthdepth
  • Establishes routing fast, then spends time on connection scaling, delivery semantics, and group fan-out tiers.
  • Experience-backed take on connections-per-node, cross-server routing via pub/sub, and presence storms.
  • Treats read-receipt aggregation, media-bypass via presigned URLs, and multi-region edge gateways as routine.
  • Frames the at-least-once + dedup guarantee and the E2E encryption fan-out cost on group size.
03 The lens senior engineers narrate through

Borrow AWS's Well-Architected pillars as your trade-off vocabulary.

You don't recite AWS — you anchor each decision to one of these. It signals you evaluate systems across competing concerns rather than optimizing one axis. Each pillar below is mapped to a move you can make in this exact design.

PILLAR 01

Operational Excellence

Confirm at every hop.

Hook: “Reliability is explicit ACKs — persist-before-ack, delivery and read receipts, heartbeats — and I monitor delivery latency and undelivered-queue depth.”
PILLAR 02

Security

End-to-end by default.

Hook: “Messages are E2E encrypted with the Signal protocol; the server routes ciphertext it can’t read, and rate-limiting plus block-lists run before delivery.”
PILLAR 03

Reliability

Never lose a message.

Hook: “A message is persisted to durable storage before I ACK the sender, so a server crash can’t drop a message the user already saw confirmed.”
PILLAR 04

Performance Efficiency

Push, don’t poll.

Hook: “Persistent WebSockets give sub-100ms delivery and let the server push — polling would waste bandwidth and add latency on every message.”
PILLAR 05

Cost Optimization

Right path per payload.

Hook: “Media bypasses chat servers via presigned storage URLs — routing gigabyte videos through connection servers would waste their capacity on bytes, not messages.”
PILLAR 06

Sustainability

Bound presence chatter.

Hook: “Presence is heartbeats with a short TTL, pushed only to users actively viewing that contact — otherwise 2B users’ status updates would be a constant storm.”
!
How to use it without sounding like a checklist. Don't list the pillars. Weave one in when you commit: name a trade-off, name the pillar it serves, and make the call. One sentence that does all three reads as senior.
03·5 The architecture you draw on the whiteboard

Route to whichever server holds the socket — confirm every hop.

Messages must arrive instantly, in order, without duplicates — and still get there when the recipient is offline. The scaling crux: millions of concurrent persistent connections, and the puzzle of routing a message to whichever server happens to hold the recipient’s socket. Reliability is built from one repeated idea: explicit confirmation at every hop.

Route via the registry; confirm at every hopsend + seqlookuproutepushif offline → storedelivered / read ackSenderWebSocketChat Server Aholds senderChat Server Bholds recipientRecipientWebSocketConnection Registryuser → server (Redis)Inbox Queueoffline (Cassandra)
Routing Delivery + ack Offline store
Find the socket, confirm the hop. Each user holds a persistent socket on some Chat Server; a connection registry maps user→server so a message routes to the right box. Offline users get an inbox queue; acks flow back at every hop. Say it: “reliability here is built from explicit confirmation, not hope.”

How to narrate it in the room

04 The interview, minute by minute

Five phases. Drive every one of them.

The simulation. Framing: a global messenger — ~2B users, millions of concurrent connections, delivery < 100ms when both are online, strict per-conversation ordering with no duplicates, reliable offline delivery, and presence that’s allowed to be slightly stale.

01Requirements & Scoping~6 min · don't draw yet
Grading this window: Do you name the connection-scaling problem (millions of sockets, cross-server routing) as the crux and pick WebSocket with a reason? Separating presence from the message path is the senior tell.

Functional requirements to land

  • 1:1 messaging — send and receive in real time.
  • Delivery & read receipts (sent / delivered / read).
  • Group chat, offline delivery, and presence (online / last-seen).

Non-functional requirements to land

  • Low latency: <100ms delivery when both parties are online.
  • Ordered, no duplicates per conversation; no lost messages.
  • Millions of concurrent connections — the defining scaling constraint.
  • Presence can be eventually consistent; a stale “last seen” is annoying, not data loss.
▲ Allow — say this

“The defining constraint is connection scale: tens of millions of persistent sockets, which means the hard problem is routing a message to the specific server holding the recipient’s connection. I’ll use WebSockets — polling wastes bandwidth and can’t push.”

▲ Allow — say this

“I’ll keep presence off the message path. Heartbeats from every client are extremely high-frequency with weak durability needs — mixing them into message delivery would pollute the critical path.”

▼ Reject — never say this

“Clients poll the server every couple of seconds for new messages.” Polling wastes bandwidth at billions of clients and adds latency — it’s the protocol you reject, not propose.

Scripted exchange
Interviewer

Design WhatsApp.

You

Core flow is send, receive, receipts, offline, presence, groups. The thing that makes it hard isn’t any one feature — it’s holding millions of persistent connections and routing a message to whichever server has the recipient’s socket. I’ll use WebSockets for the live path, a registry to track user-to-server, and a push fallback for offline. Presence I’ll handle separately since it’s high-frequency and low-durability.

Interviewer

Good. Assume tens of millions of concurrent users.

You

Then cross-server routing is the centerpiece. Let me build the 1:1 path with explicit ACKs first, then offline, then groups.

02Entities, API & Estimation~5 min
Grading this window: The connection registry named as a first-class component, a message id that enables ordering + dedup, and a sense of the connection count.

Entities: Message (id = time-ordered UUID, conversationId, senderId, ts), Connection registry (userId → chat-server), Conversation. The registry is the keystone:

// persistent WebSocket; stateful chat servers registry: userId → chatServerId (Redis, updated on connect/disconnect) message: { id: TIMEUUID, convId, from, ts } // id gives ordering + dedup

The estimate that matters

With tens of millions of concurrent connections and each chat server holding on the order of ~100K–1M sockets, you need hundreds of stateful chat servers — which is exactly why a sender and recipient are usually on different servers, making cross-server routing the central problem.

▲ Allow — say this

“The message id is a time-ordered UUID, which gives me two things for free: per-conversation ordering and a stable key for client-side dedup, so an at-least-once retry never shows a duplicate bubble.”

03High-Level Design (the MVP)~13 min
Grading this window: WebSocket + connection registry + persist-before-ack + offline push. Right components, clear cross-server flow.

The 1:1 message flow

A holds a WebSocket to Chat Server 1. On send: S1 persists the message (Cassandra) before ACKing A, then routes it — it looks up B in the registry. If B is online on Chat Server 2, S1 forwards to S2 (directly or via a pub/sub backbone), S2 delivers over B’s socket, and a delivered receipt flows back to A. If B is offline, the message sits in B’s durable inbox and a push notification (APNs/FCM) wakes the device, which fetches missed messages on reconnect.

A → S1: persist (Cassandra) → ACK A → lookup B in registry B online → forward to S2 → deliver → delivered receipt → A B offline → durable inbox + push (APNs/FCM) → fetch on reconnect dedup: message id (TIMEUUID) ordering: per-conversation sequence
!
The trap door the interviewer opens here. “A and B are on different chat servers — how does the message get across?” That’s the heart of it. A connection registry (Redis: user→server) lets S1 find B’s server and forward; at larger scale a pub/sub backbone decouples servers so any server can route to any other. Naming this routing problem yourself is the senior signal.
▲ Allow — say this

“I persist the message before I ACK the sender. That ordering matters — if I ACK first and crash, the sender thinks it’s delivered and it’s gone. Persist-then-ack is how I guarantee no silent loss.”

◆ Throttle — only with a reason

A single message broker every server publishes to. Fine conceptually, but call out the routing cost — you still need the registry (or topic-per-user) so a message reaches the one server holding the recipient, not all of them.

▼ Reject — never say this

“All users connect to one server.” One server can’t hold tens of millions of sockets — the whole problem is that connections are spread across hundreds of stateful servers.

04Deep Dives — the stress test~15 min · where staff is decided
Grading this window: Lead toward connection scaling, delivery semantics, presence, and group fan-out tiers. Staff volunteers these; 30%+ of the score.

Connection management at scale

Chat servers are stateful — each holds hundreds of thousands of live sockets. The registry (user→server) must update on every connect/disconnect; a pub/sub backbone lets any server route to any other without knowing topology. A server dying drops its connections; clients reconnect (to a new server, updating the registry) and fetch anything missed.

Delivery semantics

Aim for at-least-once delivery with client-side dedup on the message id — true exactly-once over a flaky network is impractical, so dedup makes retries safe. Ordering is per-conversation via the time-ordered id / sequence number, not global.

Presence without a storm

Presence is a heartbeat with a short TTL in Redis (e.g. ~30s); no heartbeat → offline. Critically, push presence updates only to users actively viewing that contact’s chat — broadcasting every user’s status to everyone would be a notification storm at 2B users.

Group fan-out by size

  • Small groups (up to ~hundreds): direct fan-out — the chat service pushes to each member’s socket in a loop.
  • Large groups: asynchronous fan-out via Kafka partitioned by group, with worker consumers handling delivery so the sender isn’t blocked.
  • Massive channels: a pull model — store once, members fetch on open — and aggregate read receipts rather than emitting one per member to avoid a receipt storm.

Media & encryption

Media uploads via presigned storage URLs, bypassing chat servers entirely. E2E encryption (Signal protocol) means the server routes ciphertext — and the per-recipient key encryption is exactly why very large E2E groups are hard.

▲ Allow — say this (staff move)

“Group fan-out is tiered: a loop for small groups, async Kafka fan-out for large ones, and a pull model with aggregated receipts for massive channels — because pushing every message and receipt to a hundred-thousand-member group synchronously would melt the sender’s path.”

▼ Reject — never say this

“For a huge group we just loop over all members and push.” A synchronous loop over a hundred thousand members blocks the send and creates a fan-out storm — the exact thing the tiered approach avoids.

Scripted stress-test exchange
Interviewer

A and B are on different chat servers. Trace the message.

You

A’s server persists the message and ACKs A, then looks up B in the connection registry — say B is on Server 2. It forwards the message over the pub/sub backbone to S2, which pushes it down B’s socket and sends a delivered receipt back to A. If the registry says B is offline, the message stays in B’s durable inbox and I fire a push via APNs/FCM; B fetches it on reconnect. The registry is what makes cross-server routing work.

Interviewer

B’s server crashes right as the message arrives.

You

The message was already persisted before A was ACKed, so it’s not lost. B’s client detects the dropped socket, reconnects — to a different server, which updates the registry — and fetches anything undelivered from its inbox. Delivery is at-least-once, so if a duplicate slips through, the client dedups on the message id and B never sees two bubbles.

05Wrap-up — operability & recap~6 min
Grading this window: Prove you could run it. Volunteer observability and rollout; recap; name what you deferred.

Observability

  • Concurrent connection count per server and rebalancing on server loss.
  • Delivery latency and undelivered/queued message depth.
  • Presence-update lag and heartbeat-store load.

Rollout

Drain connections gracefully on deploy so clients reconnect to healthy servers; canary protocol changes on a small connection slice. Keep the offline-fetch path robust — it’s the safety net during any disruption.

▲ Allow — say this

“With more time I’d detail multi-device sync, voice/video signaling, and the full Signal-protocol key management. I scoped them out deliberately — each is its own subsystem.”

05 The follow-up gauntlet

The probes you'll get — and the answer that holds.

Interviewers push on cross-server routing, delivery guarantees, and group scale. Name the registry, defend persist-before-ack, tier the fan-out.

"A and B are on different chat servers — how does the message get there?"

A connection registry maps each user to the server holding their socket. A's server looks up B, finds B's server, and forwards the message — directly or over a pub/sub backbone that lets any server route to any other. Without the registry, you'd have to broadcast to every server, which doesn't scale.

"WebSocket or long-polling?"

WebSocket. It's a persistent bidirectional channel, so the server can push instantly with sub-100ms latency. Polling wastes bandwidth at billions of clients and adds up to the poll interval of latency on every message — it's the option I reject.

"Guarantee ordering and no duplicates?"

Per-conversation ordering via a time-ordered message id or sequence number — not a global order, which isn't needed. Delivery is at-least-once, so the client dedups on the message id; a retry can't produce a duplicate bubble. That combination is practical where true exactly-once over a flaky network isn't.

"The recipient is offline."

The message is persisted to a durable inbox before I ACK the sender, and a push notification via APNs/FCM wakes the device. On reconnect the client fetches all missed messages. Persist-before-ack is what guarantees the offline message isn't lost.

"A 100,000-member group — how do you fan out?"

Not a synchronous loop. For large groups I fan out asynchronously via Kafka partitioned by group so the sender isn't blocked; for truly massive channels I switch to a pull model — store once, members fetch on open — and aggregate read receipts instead of emitting one per member to avoid a receipt storm.

"Presence for 2B users without melting?"

Heartbeats with a short TTL in Redis — no heartbeat means offline — kept entirely off the message path. And I only push presence updates to users actively viewing that contact's chat; broadcasting everyone's status to everyone would be a constant storm.

!
Handling a probe you can’t fully answer: anchor to the confirmation principle. “I haven’t tuned the exact heartbeat interval, but the trade is freshness against load — shorter means snappier presence and more chatter. Here’s how I’d pick it from the connection count and the staleness users tolerate.”
06 What gets you downleveled

The flags that quietly tank an otherwise solid loop.

A clean design with one of these undercurrents still scores below the bar at senior+. None are about getting an answer wrong — they're about how you operate.

Drawing before scoping

Jumping to architecture without bounding the problem or confirming scale. Reads as template-matching.

Hedging without committing

"It depends" with no decision behind it. Name the trade-off, then pick.

Polling instead of WebSocket

Proposing periodic polling for a real-time messenger — wasteful and laggy at scale. WebSocket with push is the expected answer.

One server for all connections

Ignoring that millions of sockets span hundreds of stateful servers, which makes cross-server routing the core problem.

ACK before persist

Acknowledging the sender before durably storing the message — a crash then silently loses a message the user saw confirmed.

Skipping operations entirely

No observability, no rollout, no failure-mode plan. In 2026 this reads as "has never carried a pager."

Bluffing under a probe

Confident wrong answers when pushed. Far worse than an honest "here's what I'd verify."

Not driving

Waiting to be asked the next question. At staff you own the 45 minutes.

07 Your pre-loop scorecard

Self-grade before you walk in.

Run a mock and score yourself honestly against the dimensions the interviewer uses. If you can't hit "strong" on depth and operability, that's your signal on where to drill.

DimensionWeak (downlevel)Strong (at level)
ScopingStarted drawing sockets; skipped the scaling crux.Named connection scale + cross-server routing as the crux; chose WebSocket with a reason; separated presence.
RoutingHand-waved 'servers talk to each other'.Connection registry (user→server) plus a pub/sub backbone for any-to-any routing.
Delivery semanticsNo ordering or dedup story.Persist-before-ack, per-conversation ordering, at-least-once + client dedup on message id.
OfflineForgot offline users.Durable inbox + APNs/FCM push, fetch-missed on reconnect.
Group fan-outOne loop for all group sizes.Tiered: loop for small, async Kafka for large, pull + aggregated receipts for massive.
OperabilityNever mentioned it.Connection-count and delivery-latency SLOs, graceful connection draining, presence lag.
The 60-second recap that lands the level
Quick recap: WebSockets for the live path because polling can't push; the crux is connection scale, so a registry maps user→server and a pub/sub backbone routes across hundreds of stateful chat servers; messages persist to Cassandra before the sender is ACKed; delivery is at-least-once with client dedup on a time-ordered id and per-conversation ordering; offline users get a durable inbox plus APNs/FCM push; group fan-out is tiered by size; presence is heartbeat-with-TTL kept off the message path. With more time: multi-device sync, voice/video, and Signal key management.
The one mental model: realtime messaging is a connection-routing problem with reliability built from explicit confirmation at every hop — persist before ACK, dedup on id, heartbeat for presence. Say “the crux is managing millions of connections and routing across servers” in the first two minutes, and tier everything (delivery, fan-out, presence) by scale.