Strong-Consistency Playbook — Designing Ticketmaster

01 How you're actually graded

Six buckets — and judgment outweighs the diagram.

Every FAANG company runs a rubric. The dimensions are roughly the same; the weights differ by company and level. At senior+ the boxes-and-arrows are table stakes — what gets graded hardest is the quality of your decisions: the questions you asked first, the trade-offs you surfaced and defended, and the production reality you volunteered without being asked.

Dimension	Weight	What earns the signal
Requirements & scoping	10–15%	You scoped before drawing, asked enough to bound the problem, pinned the scale number, and stated assumptions out loud.
High-level architecture	20–25%	The right components, a clear data flow, and a reason every box exists. The design satisfies each functional requirement.
Technical depth / deep dives	~30%	You go three questions deep on the hard part without being rescued. This is where staff is won or lost.
Trade-offs & judgment	highest effective	Two viable options, what each costs, and a committed pick for this system. Simplicity over flash when flash isn't warranted.
Communication / driving	cross-cutting	You drive the 45 minutes; the interviewer never has to rescue you. You narrate, checkpoint, and narrow when the design sprawls.
Operational maturity	↑ in 2026	The newest weight: observability, rollout, failure modes, on-call reality — volunteered, not pried out.

The 2026 shift, in one line. Operational concerns are now a first-class graded dimension, and "it depends" without a committed answer reads as evasion rather than nuance. Name the trade-off, then pick.

02 The same answer is scored differently at each level

It's a sliding scale, not a pass/fail bar.

A solid design with reasonable trade-offs is a strong score for a mid-level candidate and a downlevel flag for staff. The questions can be identical; the depth expectation is not. As you climb, the balance tips from breadth toward depth, proactivity, and production reality.

Mid-level

Meta E4 · Google L4 · Amazon SDE-II

breadthdepth

Models seats with a status and prevents double-booking with a database transaction or lock when prompted.
Describes the reserve-then-pay flow; may not handle the abandoned-checkout release cleanly.
Recognizes the on-sale spike is a problem but needs guidance toward a waiting room.
Interviewer confirms the consistency story; not expected to spot every failure mode alone.

Senior

Meta E5 · Google L5 · Amazon SDE-III

breadthdepth

Commits to CP for booking and a hybrid model (eventual consistency for browse/search) without prompting.
Reaches for a distributed lock with TTL and explains why native expiry beats a held DB transaction.
Makes purchase idempotent against webhook retries and surfaces the lock-failure fallback.
Proposes a virtual waiting room to gate concurrent users during the spike.

Staff+

Meta E6 · Google L6 · Amazon Principal

breadthdepth

Establishes the locking model fast, then spends time on the spike, idempotency, and failure fallbacks.
Experience-backed take on hold-window tuning, hold→purchase conversion, and oversell-rate-zero as an invariant.
Treats the waiting-room queue, OCC backstop at confirm, and PCI tokenization as routine.
Frames the consistency split per subsystem and the cross-team seams (payments, fraud, inventory).

03 The lens senior engineers narrate through

Borrow AWS's Well-Architected pillars as your trade-off vocabulary.

You don't recite AWS — you anchor each decision to one of these. It signals you evaluate systems across competing concerns rather than optimizing one axis. Each pillar below is mapped to a move you can make in this exact design.

PILLAR 01

Operational Excellence

Watch the invariant.

Hook: “My headline metric is oversell rate — it must be exactly zero — plus lock contention and hold→purchase conversion per event.”

PILLAR 02

Security

Never touch raw card data.

Hook: “Payment is tokenized client-side via the processor’s SDK; our servers never see card numbers, which keeps us out of most of PCI scope.”

PILLAR 03

Reliability

Degrade without overselling.

Hook: “If the lock store fails, I fail closed on booking — a conditional DB update as a backstop — because a rejected purchase is recoverable; a double-sold seat is not.”

PILLAR 04

Performance Efficiency

Hold cheaply.

Hook: “Holds live in an in-memory store with native TTL, so acquire/release is sub-millisecond even when a million users contend at on-sale.”

PILLAR 05

Cost Optimization

Right consistency, right place.

Hook: “Only booking needs strong consistency; the seat map and search can be eventually consistent and served from cache, which is far cheaper at spike scale.”

PILLAR 06

Sustainability

Shed load early.

Hook: “The virtual waiting room caps how many users reach the booking path at once, so I provision for a controlled rate instead of the raw stampede.”

How to use it without sounding like a checklist. Don't list the pillars. Weave one in when you commit: name a trade-off, name the pillar it serves, and make the call. One sentence that does all three reads as senior.

03·5 The architecture you draw on the whiteboard

Hold, then sell — the seat lock is the whole interview.

One rule cannot bend: no two people get the same seat. That makes this the rare design where you trade availability for consistency on purpose. The interview turns on two things: how you hold a seat during checkout, and how you keep the backend standing during a Taylor-Swift-scale stampede.

Buyer path Reserve + checkout Commit

Hold, then sell. A virtual waiting room throttles the stampede; the Booking Service takes a TTL lock on the seat (HELD) before payment, and only a committed ACID transaction flips it to SOLD. Say it: “I trade availability for consistency on purpose — no two buyers ever hold the same seat.”

How to narrate it in the room

Name the one rule. “No two people get the same seat. That single invariant means I pick consistency over availability for the seat-state store.”
Hold with a TTL. “On ‘select seat’ I take a lock — Redis SETNX or SELECT … FOR UPDATE — marking it HELD for a few minutes. If checkout stalls, the TTL releases it automatically.”
Survive the stampede. “A virtual waiting room admits buyers at a controlled rate, so the on-sale spike never hits the database all at once.”
Commit atomically. “Payment capture and the HELD→SOLD flip happen in one transaction; if payment fails, the hold expires and the seat returns to the pool.”

04 The interview, minute by minute

Five phases. Drive every one of them.

The simulation. Framing: a high-demand ticketing platform — a hot on-sale with ~1M concurrent users competing for ~a few thousand seats, a checkout hold window of ~10 min, zero tolerance for double-booking, and eventual consistency acceptable for browsing.

01Requirements & Scoping~6 min · don't draw yet

Grading this window: Do you name no-double-booking as a correctness invariant and choose CP deliberately — while keeping browse/search eventually consistent? That hybrid framing is the senior tell.

Functional requirements to land

Browse events and view a live seat map.
Reserve / hold a seat while the user checks out (a temporary, expiring claim).
Purchase the held seat; release the hold automatically if checkout is abandoned.

Non-functional requirements to land

No double-booking — ever. This is a correctness invariant, not a target. Oversell rate = 0.
Strong consistency for booking (CP by choice); eventual consistency for browse/search is fine.
Survive extreme concurrency spikes at on-sale without melting the backend.
PCI compliance on the payment path.

▲ Allow — say this

“This is a consistency problem, not a throughput problem. The invariant is that two people never get the same seat, so for booking I’ll choose CP — I’ll reject under partition rather than risk a double sale. Browsing the seat map, though, can be eventually consistent.”

▲ Allow — say this

“Two questions: how long do we hold a seat during checkout, and do we want a waiting room for big on-sales? Those shape the locking and the spike strategy more than raw QPS does.”

▼ Reject — never say this

“We’ll keep it eventually consistent for speed.” For seat inventory that’s an oversell waiting to happen. Naming the wrong consistency model here is an instant flag.

Scripted exchange

Interviewer

Design Ticketmaster.

You

The defining constraint is correctness under contention: no seat is ever sold twice. So booking is strongly consistent — CP — while browsing and the seat map can be eventually consistent and cached. The two hard parts are how I hold a seat during checkout and how I survive a million people arriving at the same second. Let me confirm the hold window and whether we want a waiting room.

Interviewer

10-minute hold. Assume a massive on-sale.

You

Then I’ll use an expiring lock for the hold and a virtual waiting room to throttle entry into the booking path. Let me build the core booking flow first, then layer the spike handling.

02Entities, API & Estimation~5 min

Grading this window: Clean model with an explicit seat state machine, and a sense of the spike concurrency that justifies a waiting room.

Entities: Event, Seat / Ticket (the state machine is the design: available → held → booked), Booking (id, userId, status). Interface:

getSeatMap(eventId) → seats[] (eventually consistent, cached) reserveSeat(eventId, seatId, userId) → bookingId (acquires hold) confirmPurchase(bookingId, paymentToken) → confirmed

The estimate that matters

At on-sale, contention is wildly uneven: a million users converge on a few thousand seats in seconds. You don’t need a precise QPS — you need to recognize that most requests will lose, so the system must reject losers cheaply and protect the booking path. That recognition is what justifies the waiting room.

▲ Allow — say this

“The seat state machine is the whole model: available, held, booked. Every concurrency question is really ‘who is allowed to transition this seat, and atomically.’”

03High-Level Design (the MVP)~13 min

Grading this window: The hold-with-expiry mechanism and a justified consistency story. Right components, clear flow.

The reserve-then-pay flow

User taps a seat → POST /bookings hits the Booking Service. It acquires a distributed lock on that seat in Redis with a 10-minute TTL using an atomic operation (SETNX), writes a booking row with status in-progress, and returns a bookingId, routing the user to payment. On successful payment, the seat flips to booked and the lock releases. If the user abandons checkout, the TTL expires and the seat returns to available automatically — no cleanup job required.

reserve: SETNX seat:{id} held (TTL 10m) → booking in-progress → payment page purchase: payment ok → seat = booked, release lock abandon: TTL expires → seat = available (automatic)

Live seat map

Push availability changes to clients with Server-Sent Events or long-polling so a seat turning red appears in near real time — this is the eventually-consistent read path, deliberately separate from the strongly-consistent booking path.

The trap door the interviewer opens here. “Why a Redis lock with a TTL — why not just a Postgres transaction or row lock?” The answer: we need a temporary reservation that auto-expires. Relational DBs have no native row TTL, so you’d bolt on cron-based expiry; Redis gives automatic key expiry and sub-millisecond acquire/release under heavy concurrency. Knowing why is the senior signal.

▲ Allow — say this

“The hold is a Redis lock with a TTL. The TTL is the elegant part — an abandoned checkout self-heals when the key expires, so I never need a sweeper job hunting for stale holds.”

◆ Throttle — only with a reason

Pure optimistic concurrency (version check at write). It’s efficient under low contention but produces a storm of failed checkouts under a hot on-sale — name the contention assumption. Best used as a final-confirm backstop, not the primary hold.

▼ Reject — never say this

“We’ll hold a database transaction open while the user pays.” Holding a transaction across a multi-minute human checkout pins connections and locks rows for the entire on-sale — it collapses immediately.

04Deep Dives — the stress test~15 min · where staff is decided

Grading this window: Lead toward locking trade-offs, the on-sale spike, idempotency, and lock-failure fallback. Staff volunteers these; 30%+ of the score.

Surviving the on-sale spike — the virtual waiting room

A million users against a few thousand seats will flatten any backend if you let them all in. Put a virtual waiting room in front: admit users into the seat-selection / booking path at a controlled rate, hold the rest in a fair FIFO queue, and show their position. This converts an uncontrollable stampede into a steady, provisioned flow — you size the backend for the admission rate, not the mob.

Idempotency on purchase

Payment processors retry webhooks; users double-click “Purchase.” Make order creation idempotent with a key (the bookingId or a client token) so a duplicate confirmation never produces a second charge or a second ticket. Non-negotiable wherever money moves.

Lock-failure fallback (the failure mode)

If the lock store is unavailable, you must fail closed on booking — fall back to a conditional update in the strongly-consistent DB (UPDATE … WHERE status = 'available', an OCC check), and if even that’s uncertain, reject the purchase. A rejected purchase is recoverable; a double-sold seat is a refund, an angry fan, and a support nightmare. As a final backstop, re-verify availability with an OCC check at the moment of confirmation.

Consistency split

Be explicit: strong for the seat transition and the order; eventual for search, the seat map, and analytics. Stating which subsystem gets which model is a senior signal in itself.

▲ Allow — say this (staff move)

“The waiting room is load-shedding by design: I’d rather admit users at a rate I’ve provisioned for and queue the rest fairly than let the full stampede hit the booking path and take everyone down.”

▼ Reject — never say this

“We’ll fail open if the lock store dies so users can still buy.” Failing open on seat inventory is the oversell. For this system, failing closed is the only defensible call.

Scripted stress-test exchange

Interviewer

Two users click the same seat in the same millisecond. Trace it.

You

Both attempt an atomic SETNX on seat:{id}. Redis serializes them, so exactly one succeeds and gets the hold plus a booking in-progress; the other’s SETNX fails, it sees the seat is held, and the client is told to pick another seat — in real time over SSE the seat already shows red. No transaction held open, no double-hold.

Interviewer

The Redis lock layer goes down mid-sale.

You

I fail closed on booking. New holds fall back to a conditional update in the source-of-truth DB — update the seat to held only where it’s currently available, which is an atomic OCC check. If that path is degraded too, I’d rather reject purchases briefly than risk overselling. The invariant wins over availability here, every time — and I’d be alerting on the fail-closed state immediately.

05Wrap-up — operability & recap~6 min

Grading this window: Prove you could run it. Volunteer the invariant metric, observability, and rollout; recap; name what you deferred.

Observability — lead with the invariant

Oversell rate — must be exactly zero; alert on any non-zero value as a sev-1.
Lock contention and hold-acquire latency during on-sale.
Hold→purchase conversion and abandoned-hold rate (informs hold-window tuning).
A loud alert when the system enters the fail-closed fallback path.

Rollout

Load-test against synthetic on-sales before real ones; canary changes to the locking/waiting-room logic on smaller events first. Keep the OCC backstop independently deployable.

▲ Allow — say this

“With more time I’d detail the payments and fraud paths and dynamic pricing. I scoped them out deliberately — payments need the same strong consistency I built for seats, just on money instead of inventory.”

05 The follow-up gauntlet

The probes you'll get — and the answer that holds.

Interviewers push on the locking model and the failure modes. Commit to CP, name the trade-off, protect the invariant.

"Two users click the same seat at once — prevent it?"

Atomic SETNX (or equivalent) on the seat key. Redis serializes the attempts, exactly one wins the hold, the loser sees the seat is held and picks another. No held transaction, no race — the atomicity is the whole guarantee.

"A user holds a seat then abandons checkout."

The hold is a lock with a TTL, so it auto-releases when the key expires — the seat returns to available with no sweeper job. That automatic expiry is exactly why I use Redis rather than a DB row lock.

"Why a Redis lock instead of a Postgres transaction or row lock?"

I need a temporary reservation that expires on its own. Relational DBs have no native row TTL, so I'd bolt on cron-based cleanup; Redis gives automatic key expiry and sub-millisecond acquire/release under the heavy concurrency of an on-sale. The DB stays the source of truth for the final booked state.

"A million users, a few thousand seats — how do you not melt?"

A virtual waiting room in front of the booking path. It admits users at a rate I've provisioned for and holds the rest in a fair FIFO queue with a visible position. I size the backend for the admission rate, not the raw stampede — it's deliberate load-shedding.

"The payment processor sends the confirmation webhook twice."

Idempotent order creation keyed on the bookingId (or a client idempotency token). The second webhook is recognized as a duplicate and is a no-op — no second charge, no second ticket. Mandatory anywhere money moves.

"The lock store goes down mid-sale."

Fail closed on booking. Holds fall back to a conditional update in the source-of-truth DB — set the seat to held only where it's available, an atomic OCC check — and if that's uncertain, reject the purchase. A rejected sale is recoverable; an oversold seat isn't. And I alert the instant we enter that fallback.

Handling a probe you can’t fully answer: anchor to the invariant. “I haven’t tuned the exact admission rate for the waiting room, but the principle is fixed — admit only what the booking path is provisioned for, queue the rest fairly. Here’s how I’d derive the rate from a load test.”

06 What gets you downleveled

The flags that quietly tank an otherwise solid loop.

A clean design with one of these undercurrents still scores below the bar at senior+. None are about getting an answer wrong — they're about how you operate.

Drawing before scoping

Jumping to architecture without bounding the problem or confirming scale. Reads as template-matching.

Hedging without committing

"It depends" with no decision behind it. Name the trade-off, then pick.

Wrong consistency model

Choosing eventual consistency for seat inventory ‘for speed.’ It's an oversell waiting to happen and an instant flag.

Holding a DB transaction across checkout

Pinning a transaction open while a human pays for minutes — it collapses the connection pool the moment the on-sale starts.

Failing open on the lock

Allowing bookings through when the lock store is down. For seat inventory, failing open is the double-sale.

Skipping operations entirely

No observability, no rollout, no failure-mode plan. In 2026 this reads as "has never carried a pager."

Bluffing under a probe

Confident wrong answers when pushed. Far worse than an honest "here's what I'd verify."

Not driving

Waiting to be asked the next question. At staff you own the 45 minutes.

07 Your pre-loop scorecard

Self-grade before you walk in.

Run a mock and score yourself honestly against the dimensions the interviewer uses. If you can't hit "strong" on depth and operability, that's your signal on where to drill.

Dimension	Weak (downlevel)	Strong (at level)
Scoping	Picked eventual consistency or skipped the invariant.	Named no-double-booking as a correctness invariant; chose CP for booking, eventual for browse.
Hold mechanism	Held a DB transaction or had no expiry.	Atomic Redis lock with TTL; abandoned holds self-heal; DB is source of truth for booked state.
Spike handling	Let the stampede hit the backend.	Virtual waiting room admitting at a provisioned rate with a fair queue.
Idempotency	Forgot duplicate webhooks / double-clicks.	Idempotent order creation keyed on bookingId; no double charge or double ticket.
Failure fallback	Failed open or had no plan.	Failed closed; OCC conditional-update backstop; alert on entering fallback.
Operability	Never mentioned it.	Oversell-rate-zero invariant metric, lock contention, conversion, fail-closed alerting.

The 60-second recap that lands the level

Quick recap: the invariant is no double-booking, so booking is CP and browse/search is eventual; the hold is an atomic Redis lock with a 10-minute TTL so abandoned checkouts self-heal; the source-of-truth DB confirms the booked state; a virtual waiting room sheds the on-sale stampede into a fair queue; purchase is idempotent against webhook retries; and on lock-store failure I fail closed with an OCC conditional-update backstop. Headline metric: oversell rate, which must be zero. With more time: payments, fraud, and dynamic pricing.

★

The one mental model: this is a contention problem where correctness beats availability. Every move protects one invariant — a seat transitions atomically, exactly once. Say “the crux here is strong consistency under contention” in the first two minutes, choose CP out loud, and never let a single answer drift toward the oversell.

Design Ticketmaster like two fans are reaching for the same seat.

Time Budget · how the 45 min should split

The shape of the problem

Six buckets — and judgment outweighs the diagram.

It's a sliding scale, not a pass/fail bar.

Borrow AWS's Well-Architected pillars as your trade-off vocabulary.

Operational Excellence

Security

Reliability

Performance Efficiency

Cost Optimization

Sustainability

Hold, then sell — the seat lock is the whole interview.

How to narrate it in the room

Five phases. Drive every one of them.

Functional requirements to land

Non-functional requirements to land

The estimate that matters

The reserve-then-pay flow

Live seat map

Surviving the on-sale spike — the virtual waiting room

Idempotency on purchase

Lock-failure fallback (the failure mode)

Consistency split

Observability — lead with the invariant

Rollout

The probes you'll get — and the answer that holds.

"Two users click the same seat at once — prevent it?"

"A user holds a seat then abandons checkout."

"Why a Redis lock instead of a Postgres transaction or row lock?"

"A million users, a few thousand seats — how do you not melt?"

"The payment processor sends the confirmation webhook twice."

"The lock store goes down mid-sale."

The flags that quietly tank an otherwise solid loop.

Self-grade before you walk in.