The payload is huge and binary, and that changes everything: this is a bandwidth problem wearing a software costume. The bytes never touch your application servers — they go straight to object storage and out through a CDN. The depth lives in the transcoding pipeline and adaptive bitrate streaming, the machinery that turns one uploaded file into smooth playback on a phone on a train and a TV on fiber.
Every FAANG company runs a rubric. The dimensions are roughly the same; the weights differ by company and level. At senior+ the boxes-and-arrows are table stakes — what gets graded hardest is the quality of your decisions: the questions you asked first, the trade-offs you surfaced and defended, and the production reality you volunteered without being asked.
| Dimension | Weight | What earns the signal |
|---|---|---|
| Requirements & scoping | 10–15% | You scoped before drawing, asked enough to bound the problem, pinned the scale number, and stated assumptions out loud. |
| High-level architecture | 20–25% | The right components, a clear data flow, and a reason every box exists. The design satisfies each functional requirement. |
| Technical depth / deep dives | ~30% | You go three questions deep on the hard part without being rescued. This is where staff is won or lost. |
| Trade-offs & judgment | highest effective | Two viable options, what each costs, and a committed pick for this system. Simplicity over flash when flash isn't warranted. |
| Communication / driving | cross-cutting | You drive the 45 minutes; the interviewer never has to rescue you. You narrate, checkpoint, and narrow when the design sprawls. |
| Operational maturity | ↑ in 2026 | The newest weight: observability, rollout, failure modes, on-call reality — volunteered, not pried out. |
A solid design with reasonable trade-offs is a strong score for a mid-level candidate and a downlevel flag for staff. The questions can be identical; the depth expectation is not. As you climb, the balance tips from breadth toward depth, proactivity, and production reality.
You don't recite AWS — you anchor each decision to one of these. It signals you evaluate systems across competing concerns rather than optimizing one axis. Each pillar below is mapped to a move you can make in this exact design.
Watch playback quality, not just uptime.
Presigned, scoped uploads.
Durable storage, resilient delivery.
Adapt to the network.
Tier the petabytes.
Encode once, serve forever.
The payload is huge and binary, and that changes everything: this is a bandwidth problem wearing a software costume. The bytes never touch your application servers — they go straight to object storage and out through a CDN. The depth lives in the transcoding pipeline and adaptive-bitrate streaming.
The simulation. Framing: a video-on-demand platform — billions of users, petabytes of storage, smooth playback across every device and network, durability around 11 nines, start-up latency < 2s. Live streaming is a separate system — scope it out.
“The core philosophy: app servers handle lightweight metadata; the heavy video bytes go straight to object storage and out through a CDN. The servers are almost untouched by the payload — that’s what makes this scale.”
“I’ll scope to video-on-demand. Live streaming has real-time encoding and a much tighter latency budget — it’s a separate system, and I’ll say so rather than blur the two.”
“We’ll store the video files in the database.” Blobs in a relational DB is the canonical media anti-pattern — it signals you’ve never handled large media.
Design YouTube.
I’ll scope to VOD — upload, process, watch — and treat live as separate. The defining constraint is that video is huge and binary, so my whole design keeps bytes off the app servers: direct-to-object-storage uploads, an async transcoding pipeline, and CDN delivery with adaptive bitrate. Metadata is the only thing my services handle directly. Let me confirm we care about smooth cross-network playback — that drives the ABR work.
Yes, smooth playback everywhere.
Then transcoding into a bitrate ladder plus adaptive streaming is the heart of it. Let me lay out upload first, then the pipeline, then the watch path.
Entities: Video (metadata + processing status), Segment files (per bitrate), Manifest files (list the available renditions/segments). Interface:
The presigned URL is the key: the server reserves an id and hands back a temporary link so the client uploads directly to object storage, bypassing the app server entirely. The manifest tells the player which segments and resolutions exist.
Compute storage (raw + every rendition) but emphasize that bandwidth dominates — serving billions of views is the cost driver, which is exactly why the CDN and a high cache-hit rate are non-negotiable, not nice-to-haves.
“The upload returns a presigned URL so bytes flow client-to-storage directly. My app server reserves the id and records metadata — it never proxies gigabytes, which would saturate its bandwidth instantly.”
Client chunks the file (10–50MB pieces) for resumable, parallel upload, and writes directly to object storage via the presigned URL. On completion, an event kicks off processing. The app server only ever touched metadata.
An asynchronous pipeline of parallel workers (using a tool like ffmpeg) splits the original into segments and encodes each segment into multiple bitrates/formats, then packages them as HLS/DASH segment files plus manifest files, all stored back in object storage. This is CPU-heavy and embarrassingly parallel — model it as a DAG of tasks.
Client calls getWatchInfo → gets metadata + manifest URL → the player reads the manifest and pulls segments from the CDN, choosing a rendition based on measured bandwidth (adaptive bitrate). Popular content is cached at edge servers worldwide.
“Transcoding is async and parallel — a DAG of workers producing renditions and manifests. The watch path is then pure CDN: the player reads the manifest and pulls segments at whatever bitrate the network supports.”
Transcoding synchronously on upload. Defensible only for tiny clips — say so. For real video it blocks the upload for minutes; transcoding must be an async pipeline off a queue.
“The client streams the raw MP4 from our server.” No ABR, no CDN, app server bandwidth as the bottleneck — three failures in one sentence.
Model it as a DAG: split the source into chunks, encode each chunk into every rendition in parallel, then package into HLS/DASH segments and generate the manifests. Parallelism across chunks and renditions is what makes processing a long video tractable. The bitrate ladder matters — steps roughly 50% apart (e.g. 1080p, 720p, 480p) so quality changes are smooth, not jarring. Codec is a trade-off: H.264 for compatibility, VP9/AV1 for ~30% better compression at much higher encode cost — worth it for popular videos where lifetime bandwidth savings dwarf the one-time encode.
Chunk the file with a fingerprint per chunk and track progress; a dropped connection at 90% resumes by re-sending only the missing chunks. (Same machinery as a file-sync system like Dropbox.)
Target a 95%+ edge cache hit rate; most views are popular content served from the edge. A viral video is a hot key — handle the thundering herd with origin shielding (a mid-tier cache absorbing edge misses) and pre-warming popular content. The CDN is what keeps origin bandwidth and cost sane.
Tier storage — hot (recent/popular) on fast storage, cold long-tail on cheap storage. Use erasure coding for durability at lower cost than 3x replication, replicated across regions. Metadata lives in a sharded DB, completely separate from the bytes.
“At petabyte scale, cost is a first-class design axis: tiered storage, erasure coding instead of triple replication, and a high CDN hit rate so I’m not paying origin egress on every view. For hot videos I’d spend extra encode CPU on AV1 to claw back lifetime bandwidth.”
“The CDN handles everything, I don’t need to think about origin load.” A viral video’s cold first-segment requests still stampede the origin — you need shielding and pre-warming.
A 4-hour 4K upload drops at 90%. What happens?
The client chunked the file and uploaded chunks directly to storage, each with a fingerprint and tracked progress. On reconnect it asks which chunks the server already has and re-sends only the missing ~10%. Nothing routed through the app server, so a long upload never ties up application capacity — and transcoding only starts once all chunks land.
Now that video goes viral globally.
Its segments are served from CDN edges, so the vast majority of requests never reach origin — I’d expect a 95%+ hit rate. The risk is the cold start: thousands of edges missing the first segment at once and stampeding origin. I’d put an origin shield — a mid-tier cache that collapses those misses — in front, and pre-warm edges for content I can predict will spike.
Roll out codec/ladder changes to a fraction of traffic and watch rebuffer ratio before widening. Keep the prior renditions available so a bad encode profile can be rolled back without re-uploading.
“With more time I’d detail search, recommendations, and view counting — view counting in particular is its own sharded-counter streaming problem. I scoped them out deliberately.”
Interviewers push on the pipeline, ABR, and CDN economics. Keep bytes off the app servers, defend the bitrate ladder, protect the origin.
The client chunked the file (10–50MB) with a fingerprint per chunk and uploaded directly to object storage. On reconnect it queries which chunks landed and re-sends only the missing ones. The app server only reserved the id and tracked progress — the bytes never went through it.
One format and bitrate can't serve every device and network. I transcode into a bitrate ladder and generate a manifest so the player adapts mid-stream — stepping down on a weak connection instead of buffering. Adaptive bitrate is the whole reason for the pipeline.
The manifest lists the available renditions. The ABR client measures bandwidth and buffer health and requests the segment at the bitrate it can sustain, stepping up or down as the network changes. The server doesn't decide — the client adapts.
Edge caching handles most of it at a 95%+ hit rate, but the cold start stampedes origin when many edges miss the first segment at once. I'd add origin shielding — a mid-tier cache that collapses concurrent misses — and pre-warm edges for predictable spikes.
Bytes — raw upload, transcoded segments, manifests — live in object storage fronted by a CDN. Metadata lives in a sharded database. App servers only ever touch metadata; routing gigabytes through them would saturate their bandwidth.
Tiered storage — hot/popular on fast storage and edges, cold long-tail on cheap tiers — plus erasure coding for durability at lower cost than 3x replication. And a high CDN hit rate so I'm not paying origin egress on every view.
A clean design with one of these undercurrents still scores below the bar at senior+. None are about getting an answer wrong — they're about how you operate.
Jumping to architecture without bounding the problem or confirming scale. Reads as template-matching.
"It depends" with no decision behind it. Name the trade-off, then pick.
Storing video files in a relational DB — the canonical media anti-pattern. Bytes belong in object storage.
Proxying gigabytes of upload/download through application servers, saturating their bandwidth. Use presigned direct-to-storage and a CDN.
Serving a single raw file with no adaptive streaming or edge caching — it won't play smoothly and won't scale.
No observability, no rollout, no failure-mode plan. In 2026 this reads as "has never carried a pager."
Confident wrong answers when pushed. Far worse than an honest "here's what I'd verify."
Waiting to be asked the next question. At staff you own the 45 minutes.
Run a mock and score yourself honestly against the dimensions the interviewer uses. If you can't hit "strong" on depth and operability, that's your signal on where to drill.
| Dimension | Weak (downlevel) | Strong (at level) |
|---|---|---|
| Scoping | One blob of a system; mixed in live. | Split upload/watch paths, scoped to VOD, named ABR + direct-to-storage. |
| Upload | Routed bytes through app servers. | Presigned URLs, client chunking, resumable, bytes bypass the app server. |
| Transcoding | One synchronous black box. | Async parallel DAG producing a bitrate ladder + manifests; codec trade-offs named. |
| Delivery | Served raw file, no CDN. | CDN with high hit rate, ABR from manifest, origin shielding for hot keys. |
| Durability & cost | Ignored tiers and cost. | Erasure coding over replication, storage tiers, CDN hit rate as a cost lever. |
| Operability | Never mentioned it. | Rebuffer ratio + start latency SLOs, transcode-queue depth, CDN hit/egress. |