Two people type in the same sentence at the same millisecond. Both edits must land, both intentions must survive, and every screen must converge on one identical document. The make-or-break of this interview is the conflict-resolution algorithm — Operational Transform or CRDT — and bluffing it is instantly obvious. Everything else (WebSockets, storage, presence) exists to make that convergence correct, fast, and durable.
Every FAANG company runs a rubric. The dimensions are roughly the same; the weights differ by company and level. At senior+ the boxes-and-arrows are table stakes — what gets graded hardest is the quality of your decisions: the questions you asked first, the trade-offs you surfaced and defended, and the production reality you volunteered without being asked.
| Dimension | Weight | What earns the signal |
|---|---|---|
| Requirements & scoping | 10–15% | You scoped before drawing, asked enough to bound the problem, pinned the scale number, and stated assumptions out loud. |
| High-level architecture | 20–25% | The right components, a clear data flow, and a reason every box exists. The design satisfies each functional requirement. |
| Technical depth / deep dives | ~30% | You go three questions deep on the hard part without being rescued. This is where staff is won or lost. |
| Trade-offs & judgment | highest effective | Two viable options, what each costs, and a committed pick for this system. Simplicity over flash when flash isn't warranted. |
| Communication / driving | cross-cutting | You drive the 45 minutes; the interviewer never has to rescue you. You narrate, checkpoint, and narrow when the design sprawls. |
| Operational maturity | ↑ in 2026 | The newest weight: observability, rollout, failure modes, on-call reality — volunteered, not pried out. |
A solid design with reasonable trade-offs is a strong score for a mid-level candidate and a downlevel flag for staff. The questions can be identical; the depth expectation is not. As you climb, the balance tips from breadth toward depth, proactivity, and production reality.
You don't recite AWS — you anchor each decision to one of these. It signals you evaluate systems across competing concerns rather than optimizing one axis. Each pillar below is mapped to a move you can make in this exact design.
Measure convergence, not just delivery.
Per-document access control.
Never lose an accepted edit.
Send deltas, hit the latency floor.
Snapshot to avoid replay.
Compact the log.
The make-or-break is the conflict-resolution algorithm (Operational Transform or CRDT) — and bluffing it is obvious. Architecturally, the trick is to funnel every client’s edits through one Doc Session that owns the algorithm, so concurrent edits are transformed into a single order and every screen converges. Everything else exists to feed that.
The simulation. Framing: a browser-based collaborative editor — many documents, up to ~50–100 concurrent editors on a hot doc, edit echo targeting < 100ms, guaranteed convergence, durability of accepted edits, and high availability.
< 100ms — above ~100–200ms it stops feeling simultaneous.“The crux isn’t storing the document — it’s convergence. If two people edit the same position at once, fast delivery alone gives them different documents. The whole architecture is organized around resolving that correctly.”
“Two scoping questions: linear rich text or a spatial canvas, and do we need offline editing? Both push me toward or away from CRDTs, so I want to know before I commit to OT.”
“We’ll save the document and last-write-wins on conflict.” For collaborative editing, last-write-wins silently destroys other users’ work — it’s the anti-answer.
Entities: Document (id, content + current revision number), Operation (type insert/delete, position, payload, clientId, baseRevision), Presence (cursor position per user). The transport is the first real decision:
Send the operation delta, never the whole file — a keystroke is a few bytes, and broadcasting the full document on every edit is both slow and unmergeable.
“WebSocket, because collaborative editing needs the server to push to clients, not just respond — true bidirectional. And I send small op deltas tagged with the revision they were based on, so the server can transform them against anything that landed since.”
Clients connect over WebSocket to a real-time gateway. Edits flow to a document/operation service that does three things: assigns a monotonically increasing revision number (a total order), transforms each incoming op against any ops that landed since its base revision (OT), and broadcasts the transformed op to all other clients on that document. Each op is durably appended to an op log before acknowledgment, so nothing accepted is ever lost.
A single server-assigned revision sequence gives every client the same order to apply ops in — that total order is what makes OT tractable. A message queue in front of the op service buffers and serializes concurrent ops and adds durability.
“The server is the single ordering authority: it stamps each op with the next revision number and transforms concurrent ops against each other before broadcasting, so every client converges on the same sequence. The op log makes it durable and replayable.”
Leading with CRDTs. They’re the right call for offline-first or a spatial/structured data model — say which. For server-mediated linear text, OT is simpler and is what Google actually ships; reach for CRDT when the constraint demands it.
“The client sends the whole updated document and the server saves it.” That can’t merge concurrent edits and throws away everyone else’s changes — the exact failure OT/CRDT exist to prevent.
OT treats edits as operations transformed against context; it needs a central server to order them, is mature, and is what Google Docs uses — but the central ordering becomes a scaling constraint. CRDTs are data types that mathematically converge without a coordinator; they shine for offline and decentralized cases but carry tombstone/metadata and compaction cost. The senior answer is not “OT” or “CRDT” — it’s: OT for the real-time hot path where a central server already exists; CRDT for offline reconciliation and non-text structured data where distributed merge is genuinely required.
With many WebSocket servers, all editors of one document must still see each other. Two options: route a document’s editors to the same server (consistent hashing on docId), or put a pub/sub broker between the op service and the WebSocket layer so any server can rebroadcast a document’s ops to its connected clients. Pub/sub decouples connections from document affinity and scales cleaner.
A client editing offline buffers ops against its last-known base revision. On reconnect, those ops are transformed against everything the server accepted in the interim (OT) — or merged via CRDT if offline is a first-class requirement. This is exactly the layer where OT’s central-ordering assumption breaks and CRDT earns its keep.
Don’t replay a million ops on open. Snapshot the materialized document periodically; loading = latest snapshot + the few ops since. Compact the op log behind the snapshot.
“I’d use OT on the hot path and reach for CRDTs only where central ordering stops being free — offline edits and structured non-text data. That hybrid is the real-world answer; Figma and others split exactly this way.”
“OT and CRDT are basically the same.” They solve the same problem with opposite assumptions about central authority — conflating them tells the interviewer you’ve only read the headline.
Two users edit the same position in the same millisecond. What happens?
The server orders them by revision — say A’s op gets revision N, B’s gets N+1. B’s op is then transformed against A’s before it’s applied and broadcast, so B’s intended insertion shifts to account for A’s. Both clients end up applying the same two ops in the same effective order and converge on an identical document. It’s emphatically not last-write-wins — both edits survive.
Now one of them was offline for an hour.
That client buffered its ops against the base revision it last saw. On reconnect, I transform each buffered op against every op the server accepted in the interim and apply them in order — same OT machinery, just a longer transform chain. If offline editing were a primary product requirement rather than an edge case, I’d move that path to a CRDT so the merge needs no central replay.
Roll transform-logic changes out carefully behind a flag with a fast rollback — a subtle OT bug corrupts documents, so canary on low-traffic docs first and watch divergence before widening.
“With more time I’d detail rich-text formatting as structured ops and the full offline/CRDT layer. I scoped them out deliberately — I didn’t miss them.”
Interviewers probe the algorithm hard because bluffing is obvious. Reason through one transform, commit to a model, name the hybrid.
The server assigns each a revision number, giving a total order, then transforms the later op against the earlier one before applying and broadcasting. Both intentions are preserved and every client converges on the same document. It is not last-write-wins — both edits land, just position-adjusted.
OT for the real-time hot path where a central server already orders ops — simpler, mature, what Google Docs uses. CRDT for offline reconciliation and non-text structured data where you need distributed merge without a coordinator. The senior answer is the hybrid: OT hot path, CRDT for the offline/structured layers.
Either route all editors of a document to the same server via consistent hashing on docId, or — cleaner — put a pub/sub broker between the op service and the WebSocket layer so any server can rebroadcast a document's ops to the clients it holds. Pub/sub decouples connection placement from document affinity.
Their client buffered ops against the last base revision it saw. On reconnect I transform each buffered op against every op the server accepted in the interim and apply in order — same OT machinery. If offline is a first-class requirement, I'd use a CRDT for that path so merge needs no central replay.
Periodic snapshots of the materialized document plus op-log compaction. Opening loads the latest snapshot and the handful of ops since it — not the entire edit history.
Log every transform and track convergence time — how long until all clients on a doc agree. Alert on convergence-time spikes and on any client/server divergence event; those are the early signal of a transform bug before users report scrambled text.
A clean design with one of these undercurrents still scores below the bar at senior+. None are about getting an answer wrong — they're about how you operate.
Jumping to architecture without bounding the problem or confirming scale. Reads as template-matching.
"It depends" with no decision behind it. Name the trade-off, then pick.
Stalling at the core deep dive until the interviewer feeds you the answer. Depth is the senior+ bar.
The anti-answer for collaborative editing — it silently destroys concurrent work. Shows you don't grasp the core requirement.
Hand-waving the algorithm names without being able to reason through a single transform. Interviewers can tell instantly.
No observability, no rollout, no failure-mode plan. In 2026 this reads as "has never carried a pager."
Confident wrong answers when pushed. Far worse than an honest "here's what I'd verify."
Waiting to be asked the next question. At staff you own the 45 minutes.
Run a mock and score yourself honestly against the dimensions the interviewer uses. If you can't hit "strong" on depth and operability, that's your signal on where to drill.
| Dimension | Weak (downlevel) | Strong (at level) |
|---|---|---|
| Scoping | Framed it as a storage problem. | Named convergence under concurrent edits as the crux; pinned the sub-100ms budget; asked text-vs-canvas and offline. |
| Transport | HTTP polling or vague. | WebSocket with a one-line reason; sends op deltas tagged with base revision, not whole files. |
| OT/CRDT depth | Named them; couldn't reason about a transform. | Walked an insert-vs-delete transform; committed to OT and framed the CRDT hybrid for offline/structured data. |
| Central ordering | No notion of ordering or durability. | Server-assigned revisions for total order; durable op log before ack; periodic snapshots. |
| WebSocket scaling | One server holds everyone. | docId affinity via consistent hashing or a pub/sub rebroadcast layer so co-editors stay in sync. |
| Operability | Never mentioned it. | Tracked convergence time and transform-conflict rate; flag-gated, canaried transform changes. |