Geospatial Playbook — Designing Uber

01 How you're actually graded

Six buckets — and judgment outweighs the diagram.

Every FAANG company runs a rubric. The dimensions are roughly the same; the weights differ by company and level. At senior+ the boxes-and-arrows are table stakes — what gets graded hardest is the quality of your decisions: the questions you asked first, the trade-offs you surfaced and defended, and the production reality you volunteered without being asked.

Dimension	Weight	What earns the signal
Requirements & scoping	10–15%	You scoped before drawing, asked enough to bound the problem, pinned the scale number, and stated assumptions out loud.
High-level architecture	20–25%	The right components, a clear data flow, and a reason every box exists. The design satisfies each functional requirement.
Technical depth / deep dives	~30%	You go three questions deep on the hard part without being rescued. This is where staff is won or lost.
Trade-offs & judgment	highest effective	Two viable options, what each costs, and a committed pick for this system. Simplicity over flash when flash isn't warranted.
Communication / driving	cross-cutting	You drive the 45 minutes; the interviewer never has to rescue you. You narrate, checkpoint, and narrow when the design sprawls.
Operational maturity	↑ in 2026	The newest weight: observability, rollout, failure modes, on-call reality — volunteered, not pried out.

The 2026 shift, in one line. Operational concerns are now a first-class graded dimension, and "it depends" without a committed answer reads as evasion rather than nuance. Name the trade-off, then pick.

02 The same answer is scored differently at each level

It's a sliding scale, not a pass/fail bar.

A solid design with reasonable trade-offs is a strong score for a mid-level candidate and a downlevel flag for staff. The questions can be identical; the depth expectation is not. As you climb, the balance tips from breadth toward depth, proactivity, and production reality.

Mid-level

Meta E4 · Google L4 · Amazon SDE-II

breadthdepth

Describes the basic flow (rider requests → find nearby → assign) and a naive lookup.
Names a spatial index (geohash) when prompted and stores driver locations in memory.
Recognizes location updates are high-volume but may not separate write and read paths cleanly.
Handles the happy path; needs guidance on the double-dispatch race.

Senior

Meta E5 · Google L5 · Amazon SDE-III

breadthdepth

Articulates the geospatial index choice (geohash vs quadtree vs S2/H3) with trade-offs, unprompted.
Separates the location write path from the matching read path; uses Redis GEO and dynamic radius expansion.
Surfaces the double-dispatch race and the hot-cell problem without a hint.
Picks consistency per subsystem: stale location is fine, dispatch assignment must be atomic.

Staff+

Meta E6 · Google L6 · Amazon Principal

breadthdepth

Establishes the index fast, then spends time on supply/demand consistency, surge data flow, and multi-city scaling.
Experience-backed take on ingestion sampling, GPS-drop handling, and matching as a continuous vs batched process.
Treats per-cell sharding, regional failover, and dispatch-success dashboards as routine.
Frames CAP per subsystem and the cross-team seams (pricing, maps/ETA, payments).

03 The lens senior engineers narrate through

Borrow AWS's Well-Architected pillars as your trade-off vocabulary.

You don't recite AWS — you anchor each decision to one of these. It signals you evaluate systems across competing concerns rather than optimizing one axis. Each pillar below is mapped to a move you can make in this exact design.

PILLAR 01

Operational Excellence

Observe matching and ingestion.

Hook: “I’d watch match latency p99, dispatch-success rate, and location-ingestion lag per region, alerting when a city’s match time creeps up.”

PILLAR 02

Security

Protect rider/driver PII.

Hook: “Live location is sensitive; I’d encrypt in transit, keep precise history short-lived, and expose only coarse location to the matching layer.”

PILLAR 03

Reliability

Survive a node or GPS loss.

Hook: “Driver positions live in a replicated in-memory store; if a shard fails, the region’s drivers re-publish within seconds and the index self-heals.”

PILLAR 04

Performance Efficiency

Turn a global scan into a few cell lookups.

Hook: “Indexing by S2 cell means a match reads the rider’s cell plus neighbors — a handful of lookups instead of scanning millions of drivers.”

PILLAR 05

Cost Optimization

Don’t durably store every ping.

Hook: “Location pings stay in memory and are sampled to durable storage; persisting 1M writes/sec to a transactional DB would be pure waste.”

PILLAR 06

Sustainability

Right-size the firehose.

Hook: “Adaptive ping intervals — slower when a driver is idle or stationary — cut ingestion volume without hurting match quality.”

How to use it without sounding like a checklist. Don't list the pillars. Weave one in when you commit: name a trade-off, name the pillar it serves, and make the call. One sentence that does all three reads as senior.

03·5 The architecture you draw on the whiteboard

Index, don’t scan — the spatial lookup, end to end.

Two problems hide in one prompt: a write-heavy location firehose (every car reporting GPS every few seconds) and a latency-critical proximity search (find the nearest driver in under two seconds). The answer to both is a spatial index — never a distance scan over a table. Draw the two paths and the design defends itself.

Location writes Match reads Index update

Index, don’t scan. Drivers stream GPS into Redis and a sharded spatial index (QuadTree / S2); the Matching Service answers “who’s near?” with a bounded cell query, not a row-by-row distance scan. Say it: “a write-heavy firehose and a latency-critical proximity search — both solved by the index.”

How to narrate it in the room

Start with the firehose. “Every driver pushes GPS every few seconds — that’s a write-heavy stream, so live locations sit in Redis with a TTL and feed a spatial index, not a relational table.”
Name the index. “I shard the map into cells with a QuadTree or Google S2 / H3. A lookup becomes ‘scan a handful of nearby cells’ — O(log n), not O(n).”
Then the match. “A ride request hits the Matching Service, which queries the index for drivers in the rider’s cell and its neighbours, ranks by ETA, and returns candidates in well under two seconds.”
Close on the trade-off. “Locations are ephemeral and approximate, so I favour availability and low latency over strict consistency — a slightly stale position is fine; a slow query is not.”

04 The interview, minute by minute

Five phases. Drive every one of them.

The simulation. Framing: city-scale ride-hailing — ~5M active drivers publishing location every ~4s (on the order of 1M+ writes/sec), riders expecting a match in < 2s, location allowed to be slightly stale, payments strongly consistent (noted, scoped out).

01Requirements & Scoping~6 min · don't draw yet

Grading this window: Do you see the two distinct problems — high-write ingestion and low-latency proximity match — and set consistency per subsystem? That framing is the senior tell.

Functional requirements to land

Drivers publish location continuously while online.
Rider requests a ride → system finds and offers the ride to the nearest available driver.
Driver accepts / declines; on decline or timeout, try the next candidate.

Non-functional requirements to land

Match latency: a driver offered within 1–2s.
Write-heavy ingestion: millions of location updates per second.
Consistency per subsystem: driver location can be eventually consistent (a few seconds stale is fine); a dispatch assignment must be strongly consistent (no two riders get the same driver).
High availability, multi-region.

▲ Allow — say this

“There are really two systems here: a write-heavy pipeline ingesting location pings, and a read-latency-critical matching service. I’ll design them separately because their scaling pressures are opposite.”

▲ Allow — say this

“I’ll set consistency per subsystem: slightly stale driver locations are acceptable, but assignment must be atomic — double-dispatching one driver to two riders is a correctness bug, not a UX nit.”

▼ Reject — never say this

“We’ll query the database for drivers within X kilometers.” A WHERE distance < r scan over all drivers can’t use an index and signals you’ve never handled spatial data.

02Entities, API & Estimation~5 min

Grading this window: Clean model, crisp interface, and a write-QPS estimate that justifies keeping locations in memory.

Entities: Driver (id, location, status: available/on-trip/offline), Rider, Trip (state machine: requested → matched → ongoing → complete). Interface:

updateLocation(driverId, lat, lng) // fire-hose, write path requestRide(riderId, lat, lng) → tripId getNearbyDrivers(lat, lng, radius) → [driverId…]

The estimate that matters

5M drivers × a ping every 4s ≈ 1.25M writes/sec. Persisting that to a durable transactional DB is hopeless and pointless. State it plainly: locations live in an in-memory geo store, sampled asynchronously to durable storage only if history is needed.

▲ Allow — say this

“At over a million location writes a second, I won’t durably persist every ping. Current position lives in memory; I sample to cold storage for analytics. The freshest data being in RAM is exactly what the match needs anyway.”

03High-Level Design (the MVP)~13 min

Grading this window: Spatial index + separated paths + dynamic radius. The right components with a reason each exists.

The spatial index

Encode every location into a cell. Geohash turns lat/long into a string whose prefix length controls precision (a 6-char cell is roughly a kilometer; 7-char is ~150m) and nearby points share prefixes — simple and the right first answer. Quadtree subdivides denser regions into finer cells, adapting to driver density. S2 / H3 (what Uber actually uses) give uniform cells on the sphere with cleaner geometry. Store drivers indexed by cell in an in-memory store (e.g. Redis with geo commands).

Two paths, one index

WRITE: driver ping → Location Service → update cell index (in-memory) READ: ride request → Matching Service → read rider cell + neighbors → rank candidates → offer via WebSocket (15s timeout) → next on decline

Dynamic radius expansion

Start with the rider’s cell and immediate neighbors. If too few candidates (common in suburbs), expand to a coarser precision / wider radius and retry until enough drivers are found or a max radius is hit — then return “no drivers available.”

The trap door the interviewer opens here. “Two riders request at the same instant and the system picks the same nearest driver — what happens?” That’s the double-dispatch race, the consistency heart of the problem. Raising it before being asked is the senior signal.

▲ Allow — say this

“Matching reads the rider’s cell plus neighbors, turning a global scan into a handful of cell lookups. If the cell is sparse I widen the radius progressively rather than scanning everything from the start.”

◆ Throttle — only with a reason

Quadtree or H3 as your first answer. They’re correct for density-adaptive surge cells, but they’re harder to implement; lead with geohash and bring up quadtree/H3 when density or surge granularity demands it.

▼ Reject — never say this

A single matching server holding all drivers in one process. It can’t survive a hot city and it’s a single point of failure — the index must shard by region/cell.

04Deep Dives — the stress test~15 min · where staff is decided

Grading this window: Lead toward index trade-offs, the double-dispatch race, hot cells, and GPS handling. Staff volunteers these; they're 30%+ of the score.

The double-dispatch race (the correctness crux)

Two matches must never assign the same driver. Make the assignment atomic: a compare-and-set on the driver’s status (available → reserved) or a short-lived distributed lock per driver. The first match wins the CAS; the second sees the driver is no longer available and moves to the next candidate. Pair it with the WebSocket offer flow: offer to the top candidate with a ~15s timeout; on accept, confirm the reservation; on decline/timeout, release and try the next.

Hot cells

Dense areas (a stadium, midtown Manhattan) pack thousands of drivers into one cell, creating a read/write hot shard. Use a finer precision there (or a quadtree that subdivides automatically) so dense regions get more, smaller cells while rural areas stay coarse. Shard the index by region so one busy city doesn’t overload a single node.

GPS drop & staleness

Mark a driver’s position stale after a missed-ping threshold and exclude stale drivers from matching; use last-known position with light interpolation for display only. Don’t offer rides based on a position that’s minutes old.

Surge (if it comes up)

Compute demand:supply per cell on a rolling window; a coarser cell (geohash-6) is the right granularity for pricing. This is a separate read of the same cell structure — a clean place to mention cross-team seams.

▲ Allow — say this (staff move)

“Assignment is the one strongly-consistent operation in an otherwise eventually-consistent system. I’d guard it with an atomic CAS on driver status so concurrent matches can’t both win the same driver — everything else, including location, can be stale.”

▼ Reject — never say this

“We’ll just lock the whole driver table.” A global lock kills throughput. The contention is per-driver; the lock must be too.

Scripted stress-test exchange

Interviewer

Two riders, same instant, both matched to driver D. What happens?

You

Both matching attempts try to flip D’s status from available to reserved with an atomic compare-and-set. Exactly one succeeds. The winner sends D the ride offer over WebSocket; the loser immediately sees D is no longer available and moves to the next candidate in its ranked list. No double-dispatch, and neither rider waits on a global lock.

Interviewer

D doesn’t respond within 15 seconds.

You

The offer times out, I release D’s reservation back to available, and offer the next-nearest candidate. If everyone in the radius declines, I expand the radius and retry, and eventually tell the rider no drivers are available rather than spinning forever.

05Wrap-up — operability & recap~6 min

Grading this window: Prove you could run it. Volunteer observability and rollout; recap the trade-offs; name what you deferred.

Observability

Match latency p99 and dispatch-success rate per city — the core SLOs.
Location-ingestion lag and dropped-ping rate per region.
Hot-cell alerts when a cell’s driver count or query rate spikes.

Rollout & resilience

Multi-region with regional failover; a city is independently operable. Roll out index or matching changes per region as canaries. Keep a coarse fallback matcher if the primary index degrades.

▲ Allow — say this

“With more time I’d detail ETA prediction and the payments path — payments need the strong consistency I deliberately kept out of the location layer. I scoped them out, I didn’t miss them.”

05 The follow-up gauntlet

The probes you'll get — and the answer that holds.

Interviewers push on the index choice and the consistency seams. Commit, name the trade-off, and stop.

"Geohash or quadtree — which, and why?"

Geohash is the right first answer: simple, prefix-based, and supported directly by Redis geo commands, so a match is a cell lookup plus neighbors. I reach for a quadtree or H3 when density varies wildly and I need adaptive cell sizes — for example, surge pricing where one Manhattan cell shouldn't hold 10,000 drivers. Lead with geohash, escalate when density demands it.

"Manhattan has 10,000 drivers in one cell. Problem?"

Yes — a hot cell, both for writes (pings) and reads (matches). Fix it with finer precision in dense regions (or a quadtree that subdivides automatically), and shard the index by region so one city's load doesn't fall on a single node.

"Prevent two riders matching the same driver?"

Make assignment atomic: a compare-and-set on the driver's status, or a short per-driver lock. First match wins, second sees the driver is reserved and tries the next candidate. The lock is per-driver, never global, so throughput stays high.

"A suburb has no drivers in the initial radius."

Dynamic radius expansion: start with the rider's cell and neighbors, and if too few candidates, step to a coarser precision / wider radius and retry until enough drivers appear or a max radius is reached — then return 'no drivers available' rather than hanging.

"1M location writes/sec — won't that crush the database?"

I never persist every ping durably. Current position lives in an in-memory geo store that all matching nodes read; I sample asynchronously to cold storage only for analytics or history. The freshest data being in RAM is exactly what matching wants.

"A driver's GPS drops mid-search."

I mark the position stale after a missed-ping threshold and exclude stale drivers from matching — offering a ride based on a minutes-old position is worse than not offering. For display I show last-known with light interpolation, but I don't dispatch on it.

Handling a probe you can’t fully answer: reason from the seams. “I haven’t tuned the exact cell precision per city, but the knobs are driver density and match recall — too coarse and matches are slow, too fine and I miss nearby drivers across cell borders. Here’s how I’d measure it.”

06 What gets you downleveled

The flags that quietly tank an otherwise solid loop.

A clean design with one of these undercurrents still scores below the bar at senior+. None are about getting an answer wrong — they're about how you operate.

Drawing before scoping

Jumping to architecture without bounding the problem or confirming scale. Reads as template-matching.

Hedging without committing

"It depends" with no decision behind it. Name the trade-off, then pick.

Needing rescue on the hard part

Stalling at the core deep dive until the interviewer feeds you the answer. Depth is the senior+ bar.

Distance-scan over a table

Proposing WHERE distance < r without a spatial index. The single fastest way to signal you've never indexed location data.

Ignoring the double-dispatch race

Designing the happy path and forgetting that two concurrent matches can grab the same driver. It's the correctness crux.

Skipping operations entirely

No observability, no rollout, no failure-mode plan. In 2026 this reads as "has never carried a pager."

Bluffing under a probe

Confident wrong answers when pushed. Far worse than an honest "here's what I'd verify."

Not driving

Waiting to be asked the next question. At staff you own the 45 minutes.

07 Your pre-loop scorecard

Self-grade before you walk in.

Run a mock and score yourself honestly against the dimensions the interviewer uses. If you can't hit "strong" on depth and operability, that's your signal on where to drill.

Dimension	Weak (downlevel)	Strong (at level)
Scoping	One undifferentiated system.	Split write-heavy ingestion from latency-critical matching; set consistency per subsystem.
Spatial index	Distance scan or vague 'use a map API'.	Committed to geohash, named quadtree/H3 and when they win, used cell + neighbor lookups.
Path separation	Pings and matches share one store/path.	In-memory geo store for live position; durable sampling separate; matching reads the index.
Double-dispatch	Needed a hint; global lock.	Raised it unprompted; per-driver atomic CAS with WebSocket offer + timeout + next-candidate.
Hot cells	Single node holds all drivers.	Finer precision in dense areas, region sharding, hot-cell alerting.
Operability	Never mentioned it.	Match-latency and dispatch-success SLOs, ingestion lag, regional failover, canary rollout.

The 60-second recap that lands the level

Quick recap: two subsystems — a write-heavy location firehose kept in memory, and a latency-critical matcher; index by S2/geohash cell so a match is cell-plus-neighbors, not a scan; dynamic radius expansion for sparse areas; the one strongly-consistent operation is assignment, guarded by per-driver atomic CAS to kill double-dispatch; finer cells and region sharding for hot cities; stale-position exclusion for GPS drops; SLOs on match latency and dispatch success with regional failover. With more time: ETA prediction and the strongly-consistent payments path.

★

The one mental model: any “find things near me” question reduces to index the world into cells, read the rider’s cell and its neighbors. Say “this is a geospatial-indexing problem” in the first two minutes, separate the write firehose from the read match, and protect the single atomic operation — assignment — from the race that defines the question.

Design Uber like the nearest driver is already pinging.

Time Budget · how the 45 min should split

The shape of the problem

Six buckets — and judgment outweighs the diagram.

It's a sliding scale, not a pass/fail bar.

Borrow AWS's Well-Architected pillars as your trade-off vocabulary.

Operational Excellence

Security

Reliability

Performance Efficiency

Cost Optimization

Sustainability

Index, don’t scan — the spatial lookup, end to end.

How to narrate it in the room

Five phases. Drive every one of them.

Functional requirements to land

Non-functional requirements to land

The estimate that matters

The spatial index

Two paths, one index

Dynamic radius expansion

The double-dispatch race (the correctness crux)

Hot cells

GPS drop & staleness

Surge (if it comes up)

Observability

Rollout & resilience

The probes you'll get — and the answer that holds.

"Geohash or quadtree — which, and why?"

"Manhattan has 10,000 drivers in one cell. Problem?"

"Prevent two riders matching the same driver?"

"A suburb has no drivers in the initial radius."

"1M location writes/sec — won't that crush the database?"

"A driver's GPS drops mid-search."

The flags that quietly tank an otherwise solid loop.

Self-grade before you walk in.