Media Playbook — Designing YouTube

01 How you're actually graded

Six buckets — and judgment outweighs the diagram.

Every FAANG company runs a rubric. The dimensions are roughly the same; the weights differ by company and level. At senior+ the boxes-and-arrows are table stakes — what gets graded hardest is the quality of your decisions: the questions you asked first, the trade-offs you surfaced and defended, and the production reality you volunteered without being asked.

Dimension	Weight	What earns the signal
Requirements & scoping	10–15%	You scoped before drawing, asked enough to bound the problem, pinned the scale number, and stated assumptions out loud.
High-level architecture	20–25%	The right components, a clear data flow, and a reason every box exists. The design satisfies each functional requirement.
Technical depth / deep dives	~30%	You go three questions deep on the hard part without being rescued. This is where staff is won or lost.
Trade-offs & judgment	highest effective	Two viable options, what each costs, and a committed pick for this system. Simplicity over flash when flash isn't warranted.
Communication / driving	cross-cutting	You drive the 45 minutes; the interviewer never has to rescue you. You narrate, checkpoint, and narrow when the design sprawls.
Operational maturity	↑ in 2026	The newest weight: observability, rollout, failure modes, on-call reality — volunteered, not pried out.

The 2026 shift, in one line. Operational concerns are now a first-class graded dimension, and "it depends" without a committed answer reads as evasion rather than nuance. Name the trade-off, then pick.

02 The same answer is scored differently at each level

It's a sliding scale, not a pass/fail bar.

A solid design with reasonable trade-offs is a strong score for a mid-level candidate and a downlevel flag for staff. The questions can be identical; the depth expectation is not. As you climb, the balance tips from breadth toward depth, proactivity, and production reality.

Mid-level

Meta E4 · Google L4 · Amazon SDE-II

breadthdepth

Separates upload from watch and stores video in object storage when prompted.
Knows transcoding is needed but treats it as one black box; mentions a CDN.
Recognizes video is large but may route bytes through app servers.
Interviewer confirms the pipeline; not expected to detail ABR or storage tiers alone.

Senior

Meta E5 · Google L5 · Amazon SDE-III

breadthdepth

Routes uploads directly to object storage via presigned URLs so app servers never touch the bytes — unprompted.
Describes the transcoding pipeline as parallel workers producing multiple renditions + manifest for ABR.
Designs resumable chunked uploads and a CDN strategy with a target hit rate.
Has opinions on the bitrate ladder, codecs, and storage tiers.

Staff+

Meta E6 · Google L6 · Amazon Principal

breadthdepth

Establishes upload/watch paths fast, then spends time on the transcoding DAG, CDN hot keys, and cost.
Experience-backed take on codec trade-offs, origin shielding, and storage-tier economics at petabyte scale.
Treats rebuffer-ratio SLOs, transcode-queue backpressure, and multi-region CDN as routine.
Frames the bandwidth-and-cost reality and the cross-team seams (search, recommendations, view counting).

03 The lens senior engineers narrate through

Borrow AWS's Well-Architected pillars as your trade-off vocabulary.

You don't recite AWS — you anchor each decision to one of these. It signals you evaluate systems across competing concerns rather than optimizing one axis. Each pillar below is mapped to a move you can make in this exact design.

PILLAR 01

Operational Excellence

Watch playback quality, not just uptime.

Hook: “My headline metrics are playback start latency and rebuffer ratio, plus transcode-queue depth so I see backpressure before users do.”

PILLAR 02

Security

Presigned, scoped uploads.

Hook: “Uploads use short-lived presigned URLs scoped to one object, so clients write directly to storage without credentials and the app server never proxies bytes.”

PILLAR 03

Reliability

Durable storage, resilient delivery.

Hook: “Segments are stored with erasure coding across regions for durability, and the CDN absorbs delivery so an origin hiccup doesn’t stop playback.”

PILLAR 04

Performance Efficiency

Adapt to the network.

Hook: “Adaptive bitrate lets the client step down quality on a weak connection instead of buffering — smooth playback beats maximum resolution.”

PILLAR 05

Cost Optimization

Tier the petabytes.

Hook: “Hot videos stay on fast storage and edge caches; cold long-tail content moves to cheaper tiers, and erasure coding beats 3x replication on cost.”

PILLAR 06

Sustainability

Encode once, serve forever.

Hook: “Transcoding is the expensive one-time cost; I’d weigh a heavier codec like AV1 — better compression, more CPU — against lifetime bandwidth saved for popular videos.”

How to use it without sounding like a checklist. Don't list the pillars. Weave one in when you commit: name a trade-off, name the pillar it serves, and make the call. One sentence that does all three reads as senior.

03·5 The architecture you draw on the whiteboard

Bytes bypass your servers — blob, transcode, edge.

The payload is huge and binary, and that changes everything: this is a bandwidth problem wearing a software costume. The bytes never touch your application servers — they go straight to object storage and out through a CDN. The depth lives in the transcoding pipeline and adaptive-bitrate streaming.

Upload + control Transcode Playback

A bandwidth problem in a software costume. The uploader PUTs bytes straight to object storage; a transcoding pipeline produces an adaptive-bitrate ladder; playback streams from the CDN. Say it: “the bytes never touch my app servers — they orchestrate; storage and the CDN carry the load.”

How to narrate it in the room

Get the bytes off the hot path. “The client asks the API for a signed URL and uploads directly to object storage. App servers never proxy gigabytes of video.”
Make transcoding the deep dive. “An upload event enqueues transcode jobs; workers produce multiple resolutions and bitrates — the ABR ladder — and write them back to storage.”
Serve from the edge. “Playback pulls an HLS/DASH manifest and segments through a CDN; the player switches bitrate by bandwidth, so it’s smooth on a phone or a TV.”
Bound the cost. “Popular videos stay hot in the CDN; cold ones fall back to origin. Transcoding is the expensive part, so it’s async and prioritized by expected views.”

04 The interview, minute by minute

Five phases. Drive every one of them.

The simulation. Framing: a video-on-demand platform — billions of users, petabytes of storage, smooth playback across every device and network, durability around 11 nines, start-up latency < 2s. Live streaming is a separate system — scope it out.

01Requirements & Scoping~6 min · don't draw yet

Grading this window: Do you separate the upload path from the watch path, scope to VOD, and name ABR + direct-to-storage as the defining moves? That framing is the senior tell.

Functional requirements to land

Upload a video (large file, resumable).
Process it into streamable form (transcode to multiple formats/bitrates).
Watch with smooth playback across devices and networks (adaptive streaming).

Non-functional requirements to land

Smooth playback under fluctuating bandwidth — the reason ABR exists.
High durability (~11 nines) and availability for stored video.
Low start latency (<2s) and global low-latency delivery.
Massive scale and cost-awareness — storage + bandwidth dominate the bill.

▲ Allow — say this

“The core philosophy: app servers handle lightweight metadata; the heavy video bytes go straight to object storage and out through a CDN. The servers are almost untouched by the payload — that’s what makes this scale.”

▲ Allow — say this

“I’ll scope to video-on-demand. Live streaming has real-time encoding and a much tighter latency budget — it’s a separate system, and I’ll say so rather than blur the two.”

▼ Reject — never say this

“We’ll store the video files in the database.” Blobs in a relational DB is the canonical media anti-pattern — it signals you’ve never handled large media.

Scripted exchange

Interviewer

Design YouTube.

You

I’ll scope to VOD — upload, process, watch — and treat live as separate. The defining constraint is that video is huge and binary, so my whole design keeps bytes off the app servers: direct-to-object-storage uploads, an async transcoding pipeline, and CDN delivery with adaptive bitrate. Metadata is the only thing my services handle directly. Let me confirm we care about smooth cross-network playback — that drives the ABR work.

Interviewer

Yes, smooth playback everywhere.

You

Then transcoding into a bitrate ladder plus adaptive streaming is the heart of it. Let me lay out upload first, then the pipeline, then the watch path.

02Entities, API & Estimation~5 min

Grading this window: Clean separation of metadata from bytes, the presigned-URL upload, and a sense that bandwidth dominates.

Entities: Video (metadata + processing status), Segment files (per bitrate), Manifest files (list the available renditions/segments). Interface:

initiateUpload(metadata) → videoId + presigned URL (client uploads direct to storage) getWatchInfo(videoId) → metadata + manifest URL (client begins ABR from CDN)

The presigned URL is the key: the server reserves an id and hands back a temporary link so the client uploads directly to object storage, bypassing the app server entirely. The manifest tells the player which segments and resolutions exist.

The estimate that matters

Compute storage (raw + every rendition) but emphasize that bandwidth dominates — serving billions of views is the cost driver, which is exactly why the CDN and a high cache-hit rate are non-negotiable, not nice-to-haves.

▲ Allow — say this

“The upload returns a presigned URL so bytes flow client-to-storage directly. My app server reserves the id and records metadata — it never proxies gigabytes, which would saturate its bandwidth instantly.”

03High-Level Design (the MVP)~13 min

Grading this window: Direct-to-storage upload, an async transcoding pipeline, and a CDN-backed watch path with ABR. Right components, clear flow.

Upload path

Client chunks the file (10–50MB pieces) for resumable, parallel upload, and writes directly to object storage via the presigned URL. On completion, an event kicks off processing. The app server only ever touched metadata.

Transcoding pipeline

An asynchronous pipeline of parallel workers (using a tool like ffmpeg) splits the original into segments and encodes each segment into multiple bitrates/formats, then packages them as HLS/DASH segment files plus manifest files, all stored back in object storage. This is CPU-heavy and embarrassingly parallel — model it as a DAG of tasks.

Watch path

Client calls getWatchInfo → gets metadata + manifest URL → the player reads the manifest and pulls segments from the CDN, choosing a rendition based on measured bandwidth (adaptive bitrate). Popular content is cached at edge servers worldwide.

UPLOAD: client chunks → presigned URL → object storage → event PROCESS: transcode DAG (split → encode renditions → package HLS/DASH) → segments + manifest in storage WATCH: client → metadata API (manifest URL) → CDN → ABR pulls segments

The trap door the interviewer opens here. “Why not just serve the original uploaded file?” One format and bitrate can’t serve a phone on cellular and a smart TV on fiber. You need the file post-processed into a bitrate ladder with a manifest so the client can adapt mid-playback. Naming ABR as the reason is the senior signal.

▲ Allow — say this

“Transcoding is async and parallel — a DAG of workers producing renditions and manifests. The watch path is then pure CDN: the player reads the manifest and pulls segments at whatever bitrate the network supports.”

◆ Throttle — only with a reason

Transcoding synchronously on upload. Defensible only for tiny clips — say so. For real video it blocks the upload for minutes; transcoding must be an async pipeline off a queue.

▼ Reject — never say this

“The client streams the raw MP4 from our server.” No ABR, no CDN, app server bandwidth as the bottleneck — three failures in one sentence.

04Deep Dives — the stress test~15 min · where staff is decided

Grading this window: Lead toward the transcoding DAG, ABR/codecs, CDN hot keys, and storage/cost tiers. Staff volunteers these; 30%+ of the score.

The transcoding pipeline, in detail

Model it as a DAG: split the source into chunks, encode each chunk into every rendition in parallel, then package into HLS/DASH segments and generate the manifests. Parallelism across chunks and renditions is what makes processing a long video tractable. The bitrate ladder matters — steps roughly 50% apart (e.g. 1080p, 720p, 480p) so quality changes are smooth, not jarring. Codec is a trade-off: H.264 for compatibility, VP9/AV1 for ~30% better compression at much higher encode cost — worth it for popular videos where lifetime bandwidth savings dwarf the one-time encode.

Resumable uploads

Chunk the file with a fingerprint per chunk and track progress; a dropped connection at 90% resumes by re-sending only the missing chunks. (Same machinery as a file-sync system like Dropbox.)

CDN strategy & hot keys

Target a 95%+ edge cache hit rate; most views are popular content served from the edge. A viral video is a hot key — handle the thundering herd with origin shielding (a mid-tier cache absorbing edge misses) and pre-warming popular content. The CDN is what keeps origin bandwidth and cost sane.

Storage tiers & durability

Tier storage — hot (recent/popular) on fast storage, cold long-tail on cheap storage. Use erasure coding for durability at lower cost than 3x replication, replicated across regions. Metadata lives in a sharded DB, completely separate from the bytes.

▲ Allow — say this (staff move)

“At petabyte scale, cost is a first-class design axis: tiered storage, erasure coding instead of triple replication, and a high CDN hit rate so I’m not paying origin egress on every view. For hot videos I’d spend extra encode CPU on AV1 to claw back lifetime bandwidth.”

▼ Reject — never say this

“The CDN handles everything, I don’t need to think about origin load.” A viral video’s cold first-segment requests still stampede the origin — you need shielding and pre-warming.

Scripted stress-test exchange

Interviewer

A 4-hour 4K upload drops at 90%. What happens?

You

The client chunked the file and uploaded chunks directly to storage, each with a fingerprint and tracked progress. On reconnect it asks which chunks the server already has and re-sends only the missing ~10%. Nothing routed through the app server, so a long upload never ties up application capacity — and transcoding only starts once all chunks land.

Interviewer

Now that video goes viral globally.

You

Its segments are served from CDN edges, so the vast majority of requests never reach origin — I’d expect a 95%+ hit rate. The risk is the cold start: thousands of edges missing the first segment at once and stampeding origin. I’d put an origin shield — a mid-tier cache that collapses those misses — in front, and pre-warm edges for content I can predict will spike.

05Wrap-up — operability & recap~6 min

Grading this window: Prove you could run it. Volunteer playback-quality observability and rollout; recap; name what you deferred.

Observability — measure the viewer experience

Playback start latency and rebuffer ratio — the true quality signals.
Transcode-queue depth and per-rendition processing time (backpressure).
CDN cache-hit rate and origin egress (cost + risk).

Rollout

Roll out codec/ladder changes to a fraction of traffic and watch rebuffer ratio before widening. Keep the prior renditions available so a bad encode profile can be rolled back without re-uploading.

▲ Allow — say this

“With more time I’d detail search, recommendations, and view counting — view counting in particular is its own sharded-counter streaming problem. I scoped them out deliberately.”

05 The follow-up gauntlet

The probes you'll get — and the answer that holds.

Interviewers push on the pipeline, ABR, and CDN economics. Keep bytes off the app servers, defend the bitrate ladder, protect the origin.

"A long upload drops partway — how do you resume?"

The client chunked the file (10–50MB) with a fingerprint per chunk and uploaded directly to object storage. On reconnect it queries which chunks landed and re-sends only the missing ones. The app server only reserved the id and tracked progress — the bytes never went through it.

"Why not just serve the original uploaded file?"

One format and bitrate can't serve every device and network. I transcode into a bitrate ladder and generate a manifest so the player adapts mid-stream — stepping down on a weak connection instead of buffering. Adaptive bitrate is the whole reason for the pipeline.

"How does the client pick a quality level?"

The manifest lists the available renditions. The ABR client measures bandwidth and buffer health and requests the segment at the bitrate it can sustain, stepping up or down as the network changes. The server doesn't decide — the client adapts.

"A video goes viral and the CDN origin gets hammered."

Edge caching handles most of it at a 95%+ hit rate, but the cold start stampedes origin when many edges miss the first segment at once. I'd add origin shielding — a mid-tier cache that collapses concurrent misses — and pre-warm edges for predictable spikes.

"Where do bytes live versus metadata?"

Bytes — raw upload, transcoded segments, manifests — live in object storage fronted by a CDN. Metadata lives in a sharded database. App servers only ever touch metadata; routing gigabytes through them would saturate their bandwidth.

"How do you keep storage cost sane at petabyte scale?"

Tiered storage — hot/popular on fast storage and edges, cold long-tail on cheap tiers — plus erasure coding for durability at lower cost than 3x replication. And a high CDN hit rate so I'm not paying origin egress on every view.

Handling a probe you can’t fully answer: reason from the cost axis. “I haven’t profiled the exact AV1-vs-H.264 break-even, but it’s encode CPU once against bandwidth saved over the video’s lifetime — worth it for popular content, not for the cold long tail. Here’s how I’d decide per video.”

06 What gets you downleveled

The flags that quietly tank an otherwise solid loop.

A clean design with one of these undercurrents still scores below the bar at senior+. None are about getting an answer wrong — they're about how you operate.

Drawing before scoping

Jumping to architecture without bounding the problem or confirming scale. Reads as template-matching.

Hedging without committing

"It depends" with no decision behind it. Name the trade-off, then pick.

Blobs in the database

Storing video files in a relational DB — the canonical media anti-pattern. Bytes belong in object storage.

Bytes through the app server

Proxying gigabytes of upload/download through application servers, saturating their bandwidth. Use presigned direct-to-storage and a CDN.

Forgetting the CDN / ABR

Serving a single raw file with no adaptive streaming or edge caching — it won't play smoothly and won't scale.

Skipping operations entirely

No observability, no rollout, no failure-mode plan. In 2026 this reads as "has never carried a pager."

Bluffing under a probe

Confident wrong answers when pushed. Far worse than an honest "here's what I'd verify."

Not driving

Waiting to be asked the next question. At staff you own the 45 minutes.

07 Your pre-loop scorecard

Self-grade before you walk in.

Run a mock and score yourself honestly against the dimensions the interviewer uses. If you can't hit "strong" on depth and operability, that's your signal on where to drill.

Dimension	Weak (downlevel)	Strong (at level)
Scoping	One blob of a system; mixed in live.	Split upload/watch paths, scoped to VOD, named ABR + direct-to-storage.
Upload	Routed bytes through app servers.	Presigned URLs, client chunking, resumable, bytes bypass the app server.
Transcoding	One synchronous black box.	Async parallel DAG producing a bitrate ladder + manifests; codec trade-offs named.
Delivery	Served raw file, no CDN.	CDN with high hit rate, ABR from manifest, origin shielding for hot keys.
Durability & cost	Ignored tiers and cost.	Erasure coding over replication, storage tiers, CDN hit rate as a cost lever.
Operability	Never mentioned it.	Rebuffer ratio + start latency SLOs, transcode-queue depth, CDN hit/egress.

The 60-second recap that lands the level

Quick recap: scoped to VOD; app servers handle only metadata while bytes go direct to object storage via presigned URLs and out through a CDN; uploads are chunked and resumable; an async parallel transcoding DAG produces a bitrate ladder of segments plus manifests for adaptive streaming; the CDN serves most views at a 95%+ hit rate with origin shielding for viral hot keys; storage is tiered with erasure coding for cheap durability. Headline metrics: rebuffer ratio and start latency. With more time: search, recommendations, and view counting.

★

The one mental model: media is a bandwidth-and-cost problem — keep the bytes off your servers, transcode once into a ladder the client can adapt across, and let the CDN carry the views. Say “this is a large-blob/media problem, so bytes never touch my app servers” in the first two minutes and the rest follows naturally.

Design YouTube like the upload is already transcoding into a dozen formats.

Time Budget · how the 45 min should split

The shape of the problem

Six buckets — and judgment outweighs the diagram.

It's a sliding scale, not a pass/fail bar.

Borrow AWS's Well-Architected pillars as your trade-off vocabulary.

Operational Excellence

Security

Reliability

Performance Efficiency

Cost Optimization

Sustainability

Bytes bypass your servers — blob, transcode, edge.

How to narrate it in the room

Five phases. Drive every one of them.

Functional requirements to land

Non-functional requirements to land

The estimate that matters

Upload path

Transcoding pipeline

Watch path

The transcoding pipeline, in detail

Resumable uploads

CDN strategy & hot keys

Storage tiers & durability

Observability — measure the viewer experience

Rollout

The probes you'll get — and the answer that holds.

"A long upload drops partway — how do you resume?"

"Why not just serve the original uploaded file?"

"How does the client pick a quality level?"

"A video goes viral and the CDN origin gets hammered."

"Where do bytes live versus metadata?"

"How do you keep storage cost sane at petabyte scale?"

The flags that quietly tank an otherwise solid loop.

Self-grade before you walk in.