A self-contained reference for how bytes actually move and why your p99 is what it is. Most engineers treat the network as a magic pipe; the ones who can reason about round trips, congestion control, head-of-line blocking, and load-balancing math design systems that stay fast under load instead of mysteriously stalling. This chapter is that reasoning, end to end.
How to use this: Part 1 is the reference card. Part 2 maps the territory. Part 3 is the full depth with pros/cons per mechanism. Part 4 is exhaustive interview prep with counter-question ladders.
Key takeaways
- Network performance is governed by round trips and congestion control, not bandwidth alone — reduce round trips and keep queues short.
- HTTP/2 over a single TCP connection still suffers transport head-of-line blocking; QUIC/HTTP3 fixes it with independent per-stream loss recovery.
- Flow control is receiver-driven; congestion control is network-driven — both can throttle you.
- Bound retries with budgets, backoff, and jitter, or they amplify outages into retry storms.
PART 1 — CHEATSHEET (Reference Card)
Every concept in this document, condensed.
The one idea
Network performance is governed by round trips (RTT) and congestion control, not bandwidth alone. Latency = (propagation + transmission + queuing + processing); you cannot beat the speed of light, so the wins come from fewer round trips, avoiding head-of-line blocking, and keeping queues short.
Core vocabulary (one line each)
- OSI / TCP-IP layers — abstraction stack: link → internet (IP) → transport (TCP/UDP) → application.
- RTT — round-trip time; the fundamental latency unit; many protocols cost N×RTT.
- TCP handshake — 3-way (SYN, SYN-ACK, ACK) = 1 RTT before data.
- Flow control — receiver-driven (sliding window) — don’t overrun the receiver.
- Congestion control — network-driven — don’t overrun the network (slow start, AIMD).
- AIMD — additive-increase/multiplicative-decrease; TCP’s fairness/stability engine.
- CUBIC / BBR — modern congestion control: loss-based (CUBIC) vs model-based (BBR, rate/RTT).
- Nagle / delayed ACK — coalesce small sends / delay ACKs; together can add latency.
- Head-of-line (HoL) blocking — one stalled item blocks others behind it (TCP byte stream, HTTP/2 over TCP).
- UDP — connectionless, unreliable datagrams; you build reliability yourself.
- QUIC / HTTP/3 — reliable multiplexed transport over UDP; per-stream (no cross-stream HoL); 0–1 RTT.
- TLS handshake — TLS 1.3 = 1 RTT (0-RTT resumption); earlier = 2 RTT.
- L4 vs L7 load balancing — by connection/IP:port vs by request content (HTTP path, headers).
- Consistent hashing / power-of-two-choices — stable key→node mapping / pick less-loaded of 2 random.
- Anycast — one IP announced from many locations; routed to nearest.
- CDN / edge — cache/compute near users to cut RTT and origin load.
- Bufferbloat — oversized queues inflate latency under load.
- Backpressure — signal upstream to slow down when overloaded.
Transport comparison
| TCP | UDP | QUIC (HTTP/3) | |
|---|---|---|---|
| Reliable/ordered | Yes (single stream) | No | Yes, per stream |
| Connection setup | 1 RTT (+TLS) | 0 | 0–1 RTT (TLS built in) |
| HoL blocking | Yes (stream-wide) | N/A | No across streams |
| Congestion control | Kernel (CUBIC/BBR) | DIY | User-space, pluggable |
| Connection migration | No (5-tuple) | DIY | Yes (connection ID) |
L4 vs L7 load balancing
| L4 (transport) | L7 (application) | |
|---|---|---|
| Decides on | IP:port, connection | HTTP path/host/headers/cookies |
| Cost | Cheap, fast | More CPU (parses requests) |
| Features | NAT, pass-through | Routing, TLS termination, retries, sticky sessions |
Numbers to memorize
- Same-DC RTT ~0.5 ms; same-region ~1–2 ms; cross-US ~30–40 ms; US↔EU ~70–90 ms; US↔Asia ~120–180 ms.
- TCP connect = 1 RTT; +TLS 1.3 = +1 RTT (or 0-RTT resume); QUIC first connect ~1 RTT incl. crypto, resume 0-RTT.
- Typical MSS ~1460 bytes (1500 MTU − headers).
- Speed of light in fiber ≈ ~5 µs/km → ~200 km per ms one way.
Quick decision rules
- Latency-sensitive web with many parallel objects → HTTP/3 (QUIC) to kill HoL blocking.
- Many short connections → reuse connections (pooling/keepalive) to amortize handshakes.
- Real-time/lossy-tolerant (games, voice, video) → UDP/QUIC with app-level recovery.
- Route by URL/host/cookie, terminate TLS, retries → L7 LB; raw throughput/pass-through → L4 LB.
- Spread keys to stateful nodes stably → consistent hashing; spread load to stateless → power-of-two-choices.
Top gotchas (litmus tests)
- Bandwidth doesn’t fix latency — a fat pipe still costs 1 RTT per round trip; reduce round trips.
- HTTP/2 multiplexes over one TCP connection → TCP HoL blocking; one lost packet stalls all streams. QUIC fixes this.
- Nagle + delayed ACK can add ~40 ms to small request/response — disable Nagle (
TCP_NODELAY) for interactive RPC. - TCP slow start means new connections start slow — connection reuse matters a lot for short transfers.
- Bufferbloat: big buffers + loss-based CC = high latency under load; BBR/AQM (CoDel) help.
- Flow control ≠ congestion control — receiver window vs network capacity; both can throttle you.
- L7 retries can amplify outages (retry storms) — use budgets + backoff + jitter.
- Consistent hashing needs virtual nodes or load is uneven.
- A single global TCP connection caps throughput at window/RTT (
BDP); parallelism or bigger windows needed for high BDP links. - Connection setup dominates short requests — measure handshakes, not just transfer time.
PART 2 — OUTLINE (full map)
- Layering and how a packet moves end-to-end
- TCP: handshake, flow control, congestion control, Nagle/delayed ACK
- Head-of-line blocking (TCP and HTTP/2)
- UDP
- QUIC and HTTP/3
- TLS handshake cost
- RPC cost models and patterns
- Load balancing: L4 vs L7 and the algorithms
- DNS and anycast
- CDNs and edge
- Bufferbloat
- Application-level backpressure and flow control
- Decision guide
- Make it stick — the teaching tutorial (round-trip & HoL pictures, mnemonics, flashcards)
PART 3 — DEEP DIVE
1. Layering and how a packet moves
The TCP/IP stack (Cerf & Kahn’s internetworking design, 1974): the link layer moves frames on a local network; the internet (IP) layer routes packets across networks (best-effort, unreliable, unordered); the transport layer (TCP/UDP) provides端-to-end semantics (reliability/ordering or not); the application layer (HTTP, gRPC) gives meaning. A byte you send is wrapped with headers at each layer (encapsulation), routed hop-by-hop by IP (each router forwards toward the destination), and unwrapped on the other side. Key consequence: IP gives no guarantees — every reliability/ordering property above it is built by TCP or your app, and each adds round trips and state.
Latency components: propagation (distance ÷ speed of light in fiber, ~5 µs/km), transmission (bytes ÷ link rate), queuing (time in router/NIC buffers — the variable, load-dependent part), and processing. Only queuing is really under your control at scale — which is why congestion control and queue management dominate tail latency.
2. TCP: handshake, flow control, congestion control
Three-way handshake: SYN → SYN-ACK → ACK establishes a connection in 1 RTT before any data flows (then +1 RTT for TLS unless resumed). Closing uses FIN/ACK exchanges.
Flow control (receiver-driven): the receiver advertises a window (how much it can buffer); the sender never has more than a window of unacknowledged data in flight. Prevents a fast sender from overwhelming a slow receiver. Max throughput ≈ window ÷ RTT (the bandwidth-delay product, BDP) — on high-BDP links you need large windows (window scaling) or parallelism.
Congestion control (network-driven) keeps the network from collapsing (Jacobson, 1988):
- Slow start: start with a small congestion window, double each RTT (exponential) until a threshold or loss — so new connections ramp up, they don’t start fast.
- Congestion avoidance / AIMD: additive increase (grow window by ~1 MSS/RTT) until loss, then multiplicative decrease (halve) — the sawtooth that gives stability and fairness.
- Fast retransmit/recovery: triple-duplicate ACKs trigger retransmit without waiting for a timeout (TCP Reno).
- CUBIC (Ha, Rhee, Xu, 2008): loss-based but uses a cubic growth function of time since last loss — better utilization on high-BDP networks; the Linux default.
- BBR (Cardwell et al., 2016): model-based — estimates bottleneck bandwidth and RTT and paces to that, rather than treating loss as the only congestion signal — much better in the presence of shallow buffers/random loss and against bufferbloat.
Nagle’s algorithm + delayed ACK: Nagle (RFC 896) coalesces small writes to avoid tiny packets; delayed ACK holds ACKs briefly to piggyback. Together they can cause a ~40 ms stall for small request/response patterns — so interactive RPC sets TCP_NODELAY (disable Nagle).
Pros of TCP: reliable, ordered, congestion-friendly, ubiquitous, offloaded in kernel/NIC. Cons: per-connection HoL blocking, handshake + slow-start cost for short flows, ossified (hard to evolve), no connection migration across IP changes.
3. Head-of-line blocking (TCP and HTTP/2)
HoL blocking: when items must be delivered in order, one stalled item blocks everything behind it. TCP delivers a single ordered byte stream, so a lost segment stalls delivery of all later bytes until it’s retransmitted — even if those later bytes already arrived. HTTP/2 multiplexes many logical streams over one TCP connection; this fixed HTTP/1.1’s request-level HoL, but a single TCP packet loss now stalls all HTTP/2 streams (transport-level HoL). This is the core motivation for QUIC.
4. UDP
UDP is a thin layer over IP: connectionless, unreliable, unordered datagrams, no congestion control. You get low overhead and full control; you must build whatever reliability/ordering/flow control you need. Used by DNS, real-time media (with app-level recovery/FEC), and as the substrate for QUIC.
Pros: minimal latency/overhead, no HoL (each datagram independent), multicast, full control. Cons: you reimplement reliability/congestion control (and risk doing it badly — being a poor network citizen); no built-in security/ordering.
5. QUIC and HTTP/3
QUIC (RFC 9000, Iyengar & Thomson, 2021) is a reliable, multiplexed, secure transport built on top of UDP in user space. It fixes TCP’s structural problems:
- No cross-stream HoL blocking: independent streams have independent loss recovery — a lost packet only stalls its own stream.
- Fast setup: crypto (TLS 1.3) is integrated, so a connection + encryption is 1 RTT, and 0-RTT on resumption.
- Connection migration: connections are identified by a connection ID, not the IP:port 5-tuple, so a client can change networks (Wi-Fi→cellular) without dropping the connection.
- User-space + pluggable congestion control: evolves without kernel/OS upgrades (avoiding TCP’s ossification).
HTTP/3 is HTTP over QUIC.
Pros: eliminates transport HoL, faster setup, migration, evolvable. Cons: UDP is sometimes throttled/blocked by middleboxes; more CPU (user-space crypto/processing) than kernel TCP; newer, less ubiquitous tooling.
6. TLS handshake cost
TLS provides confidentiality/integrity/authentication but adds round trips. TLS 1.3 (RFC 8446, Rescorla, 2018) cut the handshake to 1 RTT (down from 2 in TLS 1.2) and supports 0-RTT resumption (send app data with the first message using a pre-shared key) — at the cost that 0-RTT data is replayable, so it must be used only for idempotent requests. With QUIC, TLS is folded into the transport handshake. Practical guidance: reuse connections (pooling, keepalive, session resumption) so you pay the handshake once, not per request.
7. RPC cost models and patterns
A remote call costs at minimum: connection setup (amortizable), 1 RTT for the request/response, serialization, and server processing. Patterns that matter:
- Connection reuse / pooling / keepalive: amortize handshake + slow-start across many calls — often the single biggest latency win.
- Multiplexing (HTTP/2, gRPC): many concurrent calls over one connection (mind transport HoL → HTTP/3 for many parallel streams).
- Batching / pipelining: combine calls to amortize per-call overhead; pipeline to hide RTT.
- Streaming: gRPC streams for continuous data instead of repeated unary calls.
- Timeouts + retries with budgets/backoff/jitter: essential, but uncontrolled retries cause retry storms that amplify outages — bound them.
Pros: clean abstraction, multiplexed/streamed efficiency. Cons: the network is not free or reliable (Deutsch’s fallacies of distributed computing); every RPC adds latency, partial-failure modes, and tail-latency exposure.
8. Load balancing: L4 vs L7 and the algorithms
- L4 (transport): balances by connection/IP:port, often via NAT or direct server return; cheap, high throughput, protocol-agnostic; can’t make content decisions.
- L7 (application): parses requests (HTTP path, host, headers, cookies) to route, terminate TLS, retry, rate-limit, do sticky sessions, canary. More CPU; far more capable.
Algorithms:
- Round-robin / weighted RR: simple; ignores actual load.
- Least-connections / least-load: send to the least busy; better under heterogeneous request costs.
- Consistent hashing: map keys (or clients) to nodes on a ring so that adding/removing a node moves only ~1/N of keys — essential for stateful/cache affinity. Use virtual nodes to even out load.
- Power-of-two-choices (P2C): pick two servers at random, send to the less loaded — provably near-optimal load distribution with O(1) state, avoiding the herd behavior of “always pick the least loaded.”
Pros/cons: L4 = speed, L7 = intelligence; RR = simplicity, least-conn = load-awareness (needs state), consistent hashing = affinity (needs vnodes), P2C = great balance with tiny state.
9. DNS and anycast
DNS resolves names to addresses hierarchically (root → TLD → authoritative), cached with TTLs. It’s also a coarse load-balancing/traffic-steering tool (return different IPs by geo/health). Anycast announces the same IP from many locations; BGP routes each client to the topologically nearest instance — used by DNS resolvers, CDNs, and DDoS absorption. Caveat: anycast routing can shift mid-session (fine for stateless/UDP request-response, trickier for long-lived TCP).
10. CDNs and edge
A CDN caches content (and increasingly runs compute) at points of presence near users, cutting RTT (the dominant latency) and offloading origin. Benefits: lower latency (shorter propagation), origin protection (cache absorbs reads, DDoS scrubbing), and edge compute for personalization/auth close to users. Cache strategy (TTLs, invalidation, cache keys) and the static/dynamic split are the design crux. Cons: cache invalidation is hard; dynamic/personalized content benefits less; consistency between edge and origin must be reasoned about.
11. Bufferbloat
Bufferbloat (Gettys & Nichols, 2011): oversized network buffers, combined with loss-based congestion control (which only backs off on loss), keep queues persistently full — so latency balloons under load even though throughput looks fine. Fixes: Active Queue Management (CoDel, which targets queue delay not occupancy) and model-based CC (BBR) that paces to the bottleneck instead of filling buffers. The lesson: a full buffer is latency you’re carrying on every packet.
12. Application-level backpressure and flow control
Beyond TCP’s flow control, services need application backpressure: when a component is overloaded, it must signal upstream to slow down rather than building unbounded queues (which inflate latency and eventually OOM). Mechanisms: bounded queues, blocking/credit-based flow control (gRPC/HTTP/2 stream windows, Reactive Streams), load shedding (reject early — return 429/503 fast rather than queueing), and concurrency limits (e.g. adaptive limits à la TCP Vegas applied to RPC). The anti-pattern is an unbounded in-memory queue that turns overload into a latency spiral and then a crash.
13. Decision guide
Transport choice:
├─ Many parallel objects, latency-sensitive web ─► HTTP/3 (QUIC) — no cross-stream HoL, 0–1 RTT
├─ Standard request/response, ubiquity needed ───► HTTP/2 over TCP (reuse connections!)
├─ Real-time, loss-tolerant (games/voice/video) ─► UDP/QUIC + app-level recovery
└─ Bulk reliable transfer ───────────────────────► TCP with big windows / parallel streams
Load balancing:
├─ Route by URL/host/cookie, TLS termination, retries ─► L7
├─ Raw throughput / pass-through / non-HTTP ──────────► L4
├─ Affinity to stateful nodes / caches ──────────────► CONSISTENT HASHING (+ virtual nodes)
└─ Spread load to stateless workers ─────────────────► POWER-OF-TWO-CHOICES
Latency hygiene:
Short interactive RPC ► TCP_NODELAY (disable Nagle) + connection reuse
High latency under load ► check BUFFERBLOAT → BBR / CoDel; add BACKPRESSURE + load shedding
Global users ► CDN/edge + anycast to cut RTT
Reach-for / avoid:
- QUIC/HTTP3 — for: many parallel streams, mobile (migration). Avoid when: UDP is blocked or CPU is the bottleneck.
- L7 LB — for: content routing, TLS, retries. Avoid when: you only need cheap pass-through (use L4).
- Consistent hashing — for: cache/stateful affinity. Avoid when: fully stateless (P2C is simpler/better-balanced).
- Retries — for: transient failures. Avoid when: unbounded (retry storms) — always budget + backoff + jitter.
PART 4 — INTERVIEW ARSENAL
How to wield this. Senior signals: (1) you reason in round trips and RTT, not bandwidth; (2) you name head-of-line blocking as the reason for HTTP/3 and bufferbloat/backpressure for latency-under-load; (3) you pick load-balancing strategy by stateful vs stateless and bound retries. Each question has a model answer and counter-ladder.
A. Fundamentals
Q1. Why doesn’t more bandwidth fix latency? Answer: Latency is propagation (speed of light) + queuing + per-round-trip protocol cost; bandwidth only affects transmission time of large payloads. A request that needs 3 round trips across an 80 ms link takes ~240 ms regardless of pipe size. Wins come from fewer round trips (connection reuse, 0-RTT), avoiding HoL blocking, and shorter queues — not a fatter pipe. Counter-ladder:
- “When does bandwidth matter?” → Bulk transfers (throughput = window/RTT, or link rate); high-BDP links need big windows/parallelism.
- “Speed-of-light floor US↔EU?” → ~5 µs/km in fiber; ~5500 km → ~27 ms one way, ~55+ ms RTT minimum.
Q2. Flow control vs congestion control? Answer: Flow control is receiver-driven (the advertised window stops a fast sender from overrunning a slow receiver). Congestion control is network-driven (slow start + AIMD stop senders from overrunning the shared network). Either can throttle you; they solve different problems. Counter-ladder:
- “What limits throughput on a single TCP connection?” → window ÷ RTT (BDP); enlarge window or parallelize.
- “Why does a new connection start slow?” → TCP slow start ramps the congestion window from small; reuse connections to avoid paying it repeatedly.
B. HoL & modern transport
Q3. HTTP/2 fixed HoL blocking — but did it? Answer: It fixed request-level HoL (HTTP/1.1’s one-request-per-connection / pipelining stalls) by multiplexing streams over one connection. But all streams share one TCP byte stream, so a single lost packet stalls every stream (transport-level HoL). QUIC/HTTP3 fixes this with independent per-stream loss recovery over UDP. Counter-ladder:
- “Why can’t TCP just deliver streams independently?” → TCP is a single ordered byte stream by design; the OS can’t know your stream boundaries.
- “QUIC’s other wins?” → 0–1 RTT setup (integrated TLS 1.3), connection migration via connection ID, pluggable user-space congestion control.
- “QUIC downside?” → UDP throttling/blocking by middleboxes; higher CPU than kernel TCP.
Q4. A small request/response RPC shows ~40 ms latency on a fast LAN. Why? Answer: Classic Nagle + delayed ACK interaction: Nagle holds the small send waiting for an ACK; the peer’s delayed ACK waits ~40 ms to piggyback — a standoff. Fix: set TCP_NODELAY (disable Nagle) for interactive RPC. Also reuse the connection to avoid handshake/slow-start. Counter-ladder:
- “Tradeoff of disabling Nagle?” → More small packets / overhead; fine for latency-sensitive interactive traffic.
- “Other fixed latencies in RPC?” → Connection setup (1 RTT TCP + 1 RTT TLS) — amortize via pooling/keepalive/session resumption.
C. Load balancing & scaling
Q5. L4 vs L7 load balancing — when each, and the cost? Answer: L4 routes by connection/IP:port — cheap, fast, protocol-agnostic, good for raw throughput/pass-through. L7 parses requests to route by path/host/header/cookie, terminate TLS, retry, rate-limit, do canaries — more CPU but far more capable. Use L7 when you need content-based decisions; L4 when you just need to spread connections cheaply. Counter-ladder:
- “How balance load to stateful cache nodes?” → Consistent hashing (with virtual nodes for evenness) so only ~1/N keys move on membership change.
- “Best simple algorithm for stateless workers?” → Power-of-two-choices: pick 2 random, send to the less loaded — near-optimal with O(1) state, avoids herding on ‘the’ least-loaded.
- “Why not always least-connections?” → Needs accurate global load state; can herd new requests onto one ‘idle’ server.
Q6. Design retries safely. Answer: Retries only for idempotent or idempotency-keyed operations; bound them with a retry budget (e.g. retries ≤ 10% of requests), exponential backoff with jitter, and per-try timeouts. Otherwise retries during an incident amplify load (retry storm) and turn a brownout into an outage. Combine with circuit breakers and load shedding. Counter-ladder:
- “Why jitter?” → Avoid synchronized retry waves (thundering herd) hitting the recovering service simultaneously.
- “Retry a non-idempotent write?” → Only with an idempotency key the server dedupes on; else risk double-apply.
D. Latency under load
Q7. Throughput looks fine but latency spikes under load. Diagnose. Answer: Likely bufferbloat — full buffers (in the network or app queues) add standing delay; loss-based CC keeps them full. Network fix: BBR (paces to bottleneck) and AQM like CoDel (targets queue delay). App fix: bounded queues + backpressure + load shedding instead of unbounded in-memory queues. Measure queue delay, not just occupancy/throughput. Counter-ladder:
- “Why does an unbounded app queue make it worse?” → It absorbs overload into latency, then OOMs — a spiral; shed load early (429/503) instead.
- “How does BBR differ from CUBIC?” → BBR models bandwidth+RTT and paces; CUBIC fills the buffer until loss — BBR keeps queues shorter.
E. Worked drill — driving a design end-to-end
Watch round-trip reasoning, HoL avoidance, and backpressure drive the design.
Prompt: “Design the edge/transport layer for a global interactive web app: users worldwide load a dashboard with ~50 small API/asset fetches, on flaky mobile networks; it must feel fast and stay stable under traffic spikes.”
1 — Attack round trips first. “Latency is dominated by RTT × number of round trips, and users are global, so propagation is large. Step one: a CDN with anycast so static assets and cacheable API responses are served from a PoP near the user — turning an 80–150 ms origin RTT into a few ms. Dynamic calls still hit regional backends, ideally via the edge (terminate TLS at the edge, reuse warm origin connections).”
2 — Choose the transport for 50 parallel fetches. “Fifty small parallel objects is the textbook head-of-line-blocking scenario. Over HTTP/2/TCP, one lost packet on a flaky mobile link stalls all 50 streams. So HTTP/3 (QUIC): independent per-stream loss recovery means one lost packet stalls only its stream; plus 0–1 RTT setup and connection migration (Wi-Fi→cellular without reconnect) — exactly right for mobile.”
3 — Amortize setup. “Reuse a single QUIC connection (multiplexed) for all 50 fetches — pay the handshake once. Enable 0-RTT resumption for repeat visits, but only send 0-RTT data for idempotent GETs (0-RTT is replayable). TLS 1.3 throughout.”
4 — Interactive latency hygiene. “For the dynamic RPC path, disable Nagle (TCP_NODELAY) where TCP is still used, keep connection pools warm to backends, and avoid chatty sequential calls — batch or parallelize so we don’t stack RTTs.”
5 — Stability under spikes. “Protect against traffic surges with backpressure and load shedding: bounded queues at each tier, return 429/503 fast rather than queueing unboundedly, and concurrency limits at the edge. Retries are budgeted (≤10% of requests) with exponential backoff + jitter to prevent retry storms during a brownout. Circuit breakers trip on a failing backend so the edge serves cached/degraded responses instead of piling on.”
6 — Load balancing. “At the edge, L7 load balancing to route by path/host, terminate TLS, and do retries/canaries. To stateful caches behind it, consistent hashing with virtual nodes for affinity; to stateless API workers, power-of-two-choices for even load with minimal state.”
7 — Tradeoffs stated. “I optimized for fewer round trips (CDN/anycast + connection reuse + 0-RTT) and no HoL blocking (QUIC) — the two things that actually move global interactive latency — and bought stability with backpressure + bounded retries rather than big buffers (which would cause bufferbloat). Costs: QUIC’s higher CPU and middlebox/UDP-blocking risk (fall back to HTTP/2), and CDN cache-invalidation complexity for dynamic content. If UDP were widely blocked for our users, I’d fall back to HTTP/2 + aggressive connection reuse and accept transport HoL.”
Template: cut round trips (CDN/anycast/reuse/0-RTT) → kill HoL (QUIC) → latency hygiene (Nagle, batching) → stability (backpressure, bounded retries) → pick LB by stateful/stateless → state the tradeoff.
F. Consolidated gotchas & traps (rapid fire)
- Reduce round trips, not just bandwidth.
- HTTP/2 over TCP still has transport HoL — QUIC fixes it.
- Nagle + delayed ACK = ~40 ms stall;
TCP_NODELAYfor interactive RPC. - Slow start penalizes new connections — reuse them.
- Bufferbloat = latency under load; BBR/CoDel + backpressure.
- Flow control ≠ congestion control.
- Unbounded retries → retry storms; budget + backoff + jitter.
- Consistent hashing needs virtual nodes.
- Single TCP connection caps at window/RTT (BDP).
- 0-RTT data is replayable — idempotent requests only.
G. Pros/cons master tables
Transports
| Transport | Pros | Cons |
|---|---|---|
| TCP | Reliable, ubiquitous, kernel/NIC offload | Per-connection HoL; handshake+slow-start; ossified; no migration |
| UDP | Minimal overhead/latency; full control | DIY reliability/CC; can be a bad citizen |
| QUIC/HTTP3 | No cross-stream HoL; 0–1 RTT; migration; evolvable | UDP blocking; higher CPU; newer |
Load balancing
| Strategy | Pros | Cons |
|---|---|---|
| L4 | Fast, cheap, protocol-agnostic | No content decisions |
| L7 | Routing, TLS, retries, canaries | More CPU |
| Round-robin | Trivial | Ignores load |
| Least-connections | Load-aware | Needs state; can herd |
| Consistent hashing | Affinity; minimal key movement | Needs virtual nodes for evenness |
| Power-of-two-choices | Near-optimal, O(1) state | Slightly worse than perfect knowledge |
Go deeper (primary sources)
- Cerf & Kahn, “A Protocol for Packet Network Intercommunication” (1974).
- Jacobson, “Congestion Avoidance and Control” (1988); Allman, Paxson, Blanton, “TCP Congestion Control” (RFC 5681).
- Ha, Rhee, Xu, “CUBIC: A New TCP-Friendly High-Speed TCP Variant” (2008).
- Cardwell, Cheng, Gunn, Yeganeh, Jacobson, “BBR: Congestion-Based Congestion Control” (2016).
- Iyengar & Thomson, “QUIC: A UDP-Based Multiplexed and Secure Transport” (RFC 9000, 2021).
- Rescorla, “The Transport Layer Security (TLS) Protocol Version 1.3” (RFC 8446, 2018).
- Nagle, “Congestion Control in IP/TCP Internetworks” (RFC 896, 1984).
- Stevens, TCP/IP Illustrated, Volume 1.
- Gettys & Nichols, “Bufferbloat: Dark Buffers in the Internet” (2011).
- Dean & Barroso, “The Tail at Scale” (2013).
PART 5 — MAKE IT STICK (Teaching Tutorial)
The references are the map; this is the driving lesson. Networking clicks once you count in round trips instead of bytes, and once you can see head-of-line blocking. Two pictures do most of the work.
14.1 The one idea: count round trips, not megabytes
Latency = propagation (speed of light, fixed) + queuing + per-RTT protocol cost
A request needing 4 round trips on an 80 ms link ≈ 320 ms — NO pipe size fixes that.
Wins come from: fewer round trips, no head-of-line blocking, shorter queues.
Bandwidth moves big payloads faster; it does nothing for a chatty request that waits N round trips. The latency lever is reducing round trips (reuse connections, 0-RTT, CDNs), not buying a fatter pipe.
14.2 Head-of-line blocking — the picture that explains HTTP/3
HTTP/2 over ONE TCP stream: HTTP/3 over QUIC (independent streams):
[r1][r2][r3][r4] one byte order r1 ──●──── (lost pkt stalls only r1)
✗ lost packet r2 ──────── flows
→ ALL of r2,r3,r4 STALL behind it r3 ──────── flows
(transport-level HoL blocking) r4 ──────── flows
TCP is one ordered byte stream, so one lost packet stalls everything behind it — even unrelated requests multiplexed on top (HTTP/2). QUIC runs many independent streams over UDP, so a loss stalls only its own stream. That, plus 0–1 RTT setup and connection migration, is the whole case for HTTP/3.
14.3 Congestion vs flow control (two different brakes)
FLOW control = "don't overrun the RECEIVER" (receiver's advertised window)
CONGESTION control = "don't overrun the NETWORK" (slow start + AIMD sawtooth)
cwnd ╱╲ ╱╲ ╱╲ ← additive increase, then halve on loss (AIMD)
╱ ╲╱ ╲╱ ╲
╱ slow start ramps a NEW connection up from small → reuse connections!
Two independent brakes can each throttle you. New connections start slow (slow start), which is why connection reuse is a huge win for short requests.
14.4 The 40 ms mystery (Nagle + delayed ACK)
Sender (Nagle): "I'll wait for an ACK before sending this small chunk."
Receiver (delayed ACK): "I'll wait ~40 ms to piggyback this ACK."
→ standoff → ~40 ms stall on a fast LAN. Fix: TCP_NODELAY for interactive RPC.
14.5 Analogies that stick
- RTT = a question across a canyon — shouting faster (bandwidth) doesn’t shorten the echo time; asking fewer times does.
- TCP HoL = a single-file checkout line — one slow shopper blocks everyone. QUIC = parallel lines.
- Slow start = merging onto a highway — you accelerate gradually, not floor it.
- Consistent hashing = assigning seats by a ring so adding a row only moves a few people; power-of-two-choices = glance at two checkout lines, pick the shorter.
14.6 Misconceptions → corrections
| You might think… | Actually… |
|---|---|
| “More bandwidth = lower latency.” | No — latency is round trips + propagation; reduce round trips. |
| “HTTP/2 killed head-of-line blocking.” | Only request-level; TCP still has transport HoL. QUIC fixes it. |
| “Flow control and congestion control are the same.” | Receiver-limit vs network-limit — different brakes. |
| “Retries are free safety.” | Unbounded retries cause retry storms; budget + backoff + jitter. |
| “Consistent hashing balances evenly by itself.” | Needs virtual nodes. |
14.7 Explain it back (Feynman)
- Why doesn’t bandwidth fix a chatty request? [14.1]
- Draw HTTP/2 vs QUIC HoL blocking. [14.2]
- Flow vs congestion control — who’s protecting whom? [14.3]
- Explain the 40 ms stall and its fix. [14.4]
- L4 vs L7 — when each? [Part 3 §8]
14.8 Flashcards (cover the right column)
| Prompt | Answer |
|---|---|
| Latency lever | Fewer round trips (not bandwidth) |
| HTTP/2 weakness | Transport HoL (one TCP stream) |
| QUIC fix | Independent per-stream loss recovery |
| Flow control | Don’t overrun the receiver |
| Congestion control | Don’t overrun the network (AIMD) |
| New-connection cost | Slow start → reuse connections |
| 40 ms stall | Nagle + delayed ACK → TCP_NODELAY |
| Stateful affinity LB | Consistent hashing (+ virtual nodes) |
| Stateless LB | Power-of-two-choices |
14.9 The 60-second recall
“Network latency is round trips plus propagation plus queuing — bandwidth only speeds big payloads, so the lever is fewer round trips (connection reuse, 0-RTT, CDNs near users). TCP is a single ordered byte stream, so one lost packet causes head-of-line blocking that stalls everything behind it, including all HTTP/2 streams; QUIC fixes this with independent per-stream recovery over UDP, plus 0–1 RTT setup and connection migration. Flow control protects the receiver; congestion control (slow start, AIMD) protects the network — and slow start penalizes new connections, so reuse them. Disable Nagle (TCP_NODELAY) for interactive RPC to avoid the 40 ms delayed-ACK stall. Balance stateful traffic with consistent hashing (plus virtual nodes), stateless with power-of-two-choices, and always bound retries with budgets, backoff, and jitter.”
Frequently asked questions
Why doesn’t more bandwidth fix latency?
Latency is dominated by propagation (speed of light), queuing, and per-round-trip protocol cost; bandwidth only affects the transmission time of large payloads. A request needing several round trips across a high-latency link is slow regardless of pipe size. Wins come from fewer round trips, avoiding head-of-line blocking, and shorter queues.
What is head-of-line blocking and how does QUIC fix it?
Head-of-line blocking is when one stalled item blocks everything behind it. HTTP/2 multiplexes streams over one TCP byte stream, so a single lost packet stalls all streams. QUIC runs over UDP with independent per-stream loss recovery, so a lost packet only stalls its own stream.
What’s the difference between flow control and congestion control?
Flow control is receiver-driven: the advertised window stops a fast sender from overrunning a slow receiver. Congestion control is network-driven: slow start and additive-increase/multiplicative-decrease stop senders from overrunning the shared network. Either can limit throughput.
L4 vs L7 load balancing — when do you use each?
L4 balances by connection and IP:port — cheap, fast, and protocol-agnostic, good for raw throughput. L7 parses requests to route by path, host, header, or cookie, terminate TLS, retry, and do canaries — more CPU but far more capable. Use L7 for content-based decisions, L4 for cheap pass-through.
What causes about 40ms of latency on small TCP requests?
The interaction of Nagle’s algorithm and delayed ACK: Nagle holds a small send waiting for an acknowledgment while the peer delays the ACK to piggyback, creating a standoff. Disable Nagle with TCP_NODELAY for interactive RPC, and reuse connections to avoid handshake and slow-start cost.
