Networking & the Data Plane: TCP, QUIC & Load Balancing Reference

Networking & the Data Plane — System Design Handbook Part 7 featured image

A self-contained reference for how bytes actually move and why your p99 is what it is. Most engineers treat the network as a magic pipe; the ones who can reason about round trips, congestion control, head-of-line blocking, and load-balancing math design systems that stay fast under load instead of mysteriously stalling. This chapter is that reasoning, end to end.

How to use this: Part 1 is the reference card. Part 2 maps the territory. Part 3 is the full depth with pros/cons per mechanism. Part 4 is exhaustive interview prep with counter-question ladders.

Key takeaways

  • Network performance is governed by round trips and congestion control, not bandwidth alone — reduce round trips and keep queues short.
  • HTTP/2 over a single TCP connection still suffers transport head-of-line blocking; QUIC/HTTP3 fixes it with independent per-stream loss recovery.
  • Flow control is receiver-driven; congestion control is network-driven — both can throttle you.
  • Bound retries with budgets, backoff, and jitter, or they amplify outages into retry storms.

PART 1 — CHEATSHEET (Reference Card)

Every concept in this document, condensed.

The one idea

Network performance is governed by round trips (RTT) and congestion control, not bandwidth alone. Latency = (propagation + transmission + queuing + processing); you cannot beat the speed of light, so the wins come from fewer round trips, avoiding head-of-line blocking, and keeping queues short.

Core vocabulary (one line each)

  • OSI / TCP-IP layers — abstraction stack: link → internet (IP) → transport (TCP/UDP) → application.
  • RTT — round-trip time; the fundamental latency unit; many protocols cost N×RTT.
  • TCP handshake — 3-way (SYN, SYN-ACK, ACK) = 1 RTT before data.
  • Flow control — receiver-driven (sliding window) — don’t overrun the receiver.
  • Congestion control — network-driven — don’t overrun the network (slow start, AIMD).
  • AIMD — additive-increase/multiplicative-decrease; TCP’s fairness/stability engine.
  • CUBIC / BBR — modern congestion control: loss-based (CUBIC) vs model-based (BBR, rate/RTT).
  • Nagle / delayed ACK — coalesce small sends / delay ACKs; together can add latency.
  • Head-of-line (HoL) blocking — one stalled item blocks others behind it (TCP byte stream, HTTP/2 over TCP).
  • UDP — connectionless, unreliable datagrams; you build reliability yourself.
  • QUIC / HTTP/3 — reliable multiplexed transport over UDP; per-stream (no cross-stream HoL); 0–1 RTT.
  • TLS handshake — TLS 1.3 = 1 RTT (0-RTT resumption); earlier = 2 RTT.
  • L4 vs L7 load balancing — by connection/IP:port vs by request content (HTTP path, headers).
  • Consistent hashing / power-of-two-choices — stable key→node mapping / pick less-loaded of 2 random.
  • Anycast — one IP announced from many locations; routed to nearest.
  • CDN / edge — cache/compute near users to cut RTT and origin load.
  • Bufferbloat — oversized queues inflate latency under load.
  • Backpressure — signal upstream to slow down when overloaded.

Transport comparison

TCP UDP QUIC (HTTP/3)
Reliable/ordered Yes (single stream) No Yes, per stream
Connection setup 1 RTT (+TLS) 0 0–1 RTT (TLS built in)
HoL blocking Yes (stream-wide) N/A No across streams
Congestion control Kernel (CUBIC/BBR) DIY User-space, pluggable
Connection migration No (5-tuple) DIY Yes (connection ID)

L4 vs L7 load balancing

L4 (transport) L7 (application)
Decides on IP:port, connection HTTP path/host/headers/cookies
Cost Cheap, fast More CPU (parses requests)
Features NAT, pass-through Routing, TLS termination, retries, sticky sessions

Numbers to memorize

  • Same-DC RTT ~0.5 ms; same-region ~1–2 ms; cross-US ~30–40 ms; US↔EU ~70–90 ms; US↔Asia ~120–180 ms.
  • TCP connect = 1 RTT; +TLS 1.3 = +1 RTT (or 0-RTT resume); QUIC first connect ~1 RTT incl. crypto, resume 0-RTT.
  • Typical MSS ~1460 bytes (1500 MTU − headers).
  • Speed of light in fiber ≈ ~5 µs/km → ~200 km per ms one way.

Quick decision rules

  • Latency-sensitive web with many parallel objects → HTTP/3 (QUIC) to kill HoL blocking.
  • Many short connections → reuse connections (pooling/keepalive) to amortize handshakes.
  • Real-time/lossy-tolerant (games, voice, video) → UDP/QUIC with app-level recovery.
  • Route by URL/host/cookie, terminate TLS, retries → L7 LB; raw throughput/pass-through → L4 LB.
  • Spread keys to stateful nodes stably → consistent hashing; spread load to stateless → power-of-two-choices.

Top gotchas (litmus tests)

  1. Bandwidth doesn’t fix latency — a fat pipe still costs 1 RTT per round trip; reduce round trips.
  2. HTTP/2 multiplexes over one TCP connection → TCP HoL blocking; one lost packet stalls all streams. QUIC fixes this.
  3. Nagle + delayed ACK can add ~40 ms to small request/response — disable Nagle (TCP_NODELAY) for interactive RPC.
  4. TCP slow start means new connections start slow — connection reuse matters a lot for short transfers.
  5. Bufferbloat: big buffers + loss-based CC = high latency under load; BBR/AQM (CoDel) help.
  6. Flow control ≠ congestion control — receiver window vs network capacity; both can throttle you.
  7. L7 retries can amplify outages (retry storms) — use budgets + backoff + jitter.
  8. Consistent hashing needs virtual nodes or load is uneven.
  9. A single global TCP connection caps throughput at window/RTT (BDP); parallelism or bigger windows needed for high BDP links.
  10. Connection setup dominates short requests — measure handshakes, not just transfer time.

PART 2 — OUTLINE (full map)

  1. Layering and how a packet moves end-to-end
  2. TCP: handshake, flow control, congestion control, Nagle/delayed ACK
  3. Head-of-line blocking (TCP and HTTP/2)
  4. UDP
  5. QUIC and HTTP/3
  6. TLS handshake cost
  7. RPC cost models and patterns
  8. Load balancing: L4 vs L7 and the algorithms
  9. DNS and anycast
  10. CDNs and edge
  11. Bufferbloat
  12. Application-level backpressure and flow control
  13. Decision guide
  14. Make it stick — the teaching tutorial (round-trip & HoL pictures, mnemonics, flashcards)

PART 3 — DEEP DIVE

1. Layering and how a packet moves

The TCP/IP stack (Cerf & Kahn’s internetworking design, 1974): the link layer moves frames on a local network; the internet (IP) layer routes packets across networks (best-effort, unreliable, unordered); the transport layer (TCP/UDP) provides端-to-end semantics (reliability/ordering or not); the application layer (HTTP, gRPC) gives meaning. A byte you send is wrapped with headers at each layer (encapsulation), routed hop-by-hop by IP (each router forwards toward the destination), and unwrapped on the other side. Key consequence: IP gives no guarantees — every reliability/ordering property above it is built by TCP or your app, and each adds round trips and state.

Latency components: propagation (distance ÷ speed of light in fiber, ~5 µs/km), transmission (bytes ÷ link rate), queuing (time in router/NIC buffers — the variable, load-dependent part), and processing. Only queuing is really under your control at scale — which is why congestion control and queue management dominate tail latency.

2. TCP: handshake, flow control, congestion control

Three-way handshake: SYN → SYN-ACK → ACK establishes a connection in 1 RTT before any data flows (then +1 RTT for TLS unless resumed). Closing uses FIN/ACK exchanges.

Flow control (receiver-driven): the receiver advertises a window (how much it can buffer); the sender never has more than a window of unacknowledged data in flight. Prevents a fast sender from overwhelming a slow receiver. Max throughput ≈ window ÷ RTT (the bandwidth-delay product, BDP) — on high-BDP links you need large windows (window scaling) or parallelism.

Congestion control (network-driven) keeps the network from collapsing (Jacobson, 1988):

  • Slow start: start with a small congestion window, double each RTT (exponential) until a threshold or loss — so new connections ramp up, they don’t start fast.
  • Congestion avoidance / AIMD: additive increase (grow window by ~1 MSS/RTT) until loss, then multiplicative decrease (halve) — the sawtooth that gives stability and fairness.
  • Fast retransmit/recovery: triple-duplicate ACKs trigger retransmit without waiting for a timeout (TCP Reno).
  • CUBIC (Ha, Rhee, Xu, 2008): loss-based but uses a cubic growth function of time since last loss — better utilization on high-BDP networks; the Linux default.
  • BBR (Cardwell et al., 2016): model-based — estimates bottleneck bandwidth and RTT and paces to that, rather than treating loss as the only congestion signal — much better in the presence of shallow buffers/random loss and against bufferbloat.

Nagle’s algorithm + delayed ACK: Nagle (RFC 896) coalesces small writes to avoid tiny packets; delayed ACK holds ACKs briefly to piggyback. Together they can cause a ~40 ms stall for small request/response patterns — so interactive RPC sets TCP_NODELAY (disable Nagle).

Pros of TCP: reliable, ordered, congestion-friendly, ubiquitous, offloaded in kernel/NIC. Cons: per-connection HoL blocking, handshake + slow-start cost for short flows, ossified (hard to evolve), no connection migration across IP changes.

3. Head-of-line blocking (TCP and HTTP/2)

HoL blocking: when items must be delivered in order, one stalled item blocks everything behind it. TCP delivers a single ordered byte stream, so a lost segment stalls delivery of all later bytes until it’s retransmitted — even if those later bytes already arrived. HTTP/2 multiplexes many logical streams over one TCP connection; this fixed HTTP/1.1’s request-level HoL, but a single TCP packet loss now stalls all HTTP/2 streams (transport-level HoL). This is the core motivation for QUIC.

4. UDP

UDP is a thin layer over IP: connectionless, unreliable, unordered datagrams, no congestion control. You get low overhead and full control; you must build whatever reliability/ordering/flow control you need. Used by DNS, real-time media (with app-level recovery/FEC), and as the substrate for QUIC.

Pros: minimal latency/overhead, no HoL (each datagram independent), multicast, full control. Cons: you reimplement reliability/congestion control (and risk doing it badly — being a poor network citizen); no built-in security/ordering.

5. QUIC and HTTP/3

QUIC (RFC 9000, Iyengar & Thomson, 2021) is a reliable, multiplexed, secure transport built on top of UDP in user space. It fixes TCP’s structural problems:

  • No cross-stream HoL blocking: independent streams have independent loss recovery — a lost packet only stalls its own stream.
  • Fast setup: crypto (TLS 1.3) is integrated, so a connection + encryption is 1 RTT, and 0-RTT on resumption.
  • Connection migration: connections are identified by a connection ID, not the IP:port 5-tuple, so a client can change networks (Wi-Fi→cellular) without dropping the connection.
  • User-space + pluggable congestion control: evolves without kernel/OS upgrades (avoiding TCP’s ossification).

HTTP/3 is HTTP over QUIC.

Pros: eliminates transport HoL, faster setup, migration, evolvable. Cons: UDP is sometimes throttled/blocked by middleboxes; more CPU (user-space crypto/processing) than kernel TCP; newer, less ubiquitous tooling.

6. TLS handshake cost

TLS provides confidentiality/integrity/authentication but adds round trips. TLS 1.3 (RFC 8446, Rescorla, 2018) cut the handshake to 1 RTT (down from 2 in TLS 1.2) and supports 0-RTT resumption (send app data with the first message using a pre-shared key) — at the cost that 0-RTT data is replayable, so it must be used only for idempotent requests. With QUIC, TLS is folded into the transport handshake. Practical guidance: reuse connections (pooling, keepalive, session resumption) so you pay the handshake once, not per request.

7. RPC cost models and patterns

A remote call costs at minimum: connection setup (amortizable), 1 RTT for the request/response, serialization, and server processing. Patterns that matter:

  • Connection reuse / pooling / keepalive: amortize handshake + slow-start across many calls — often the single biggest latency win.
  • Multiplexing (HTTP/2, gRPC): many concurrent calls over one connection (mind transport HoL → HTTP/3 for many parallel streams).
  • Batching / pipelining: combine calls to amortize per-call overhead; pipeline to hide RTT.
  • Streaming: gRPC streams for continuous data instead of repeated unary calls.
  • Timeouts + retries with budgets/backoff/jitter: essential, but uncontrolled retries cause retry storms that amplify outages — bound them.

Pros: clean abstraction, multiplexed/streamed efficiency. Cons: the network is not free or reliable (Deutsch’s fallacies of distributed computing); every RPC adds latency, partial-failure modes, and tail-latency exposure.

8. Load balancing: L4 vs L7 and the algorithms

  • L4 (transport): balances by connection/IP:port, often via NAT or direct server return; cheap, high throughput, protocol-agnostic; can’t make content decisions.
  • L7 (application): parses requests (HTTP path, host, headers, cookies) to route, terminate TLS, retry, rate-limit, do sticky sessions, canary. More CPU; far more capable.

Algorithms:

  • Round-robin / weighted RR: simple; ignores actual load.
  • Least-connections / least-load: send to the least busy; better under heterogeneous request costs.
  • Consistent hashing: map keys (or clients) to nodes on a ring so that adding/removing a node moves only ~1/N of keys — essential for stateful/cache affinity. Use virtual nodes to even out load.
  • Power-of-two-choices (P2C): pick two servers at random, send to the less loaded — provably near-optimal load distribution with O(1) state, avoiding the herd behavior of “always pick the least loaded.”

Pros/cons: L4 = speed, L7 = intelligence; RR = simplicity, least-conn = load-awareness (needs state), consistent hashing = affinity (needs vnodes), P2C = great balance with tiny state.

9. DNS and anycast

DNS resolves names to addresses hierarchically (root → TLD → authoritative), cached with TTLs. It’s also a coarse load-balancing/traffic-steering tool (return different IPs by geo/health). Anycast announces the same IP from many locations; BGP routes each client to the topologically nearest instance — used by DNS resolvers, CDNs, and DDoS absorption. Caveat: anycast routing can shift mid-session (fine for stateless/UDP request-response, trickier for long-lived TCP).

10. CDNs and edge

A CDN caches content (and increasingly runs compute) at points of presence near users, cutting RTT (the dominant latency) and offloading origin. Benefits: lower latency (shorter propagation), origin protection (cache absorbs reads, DDoS scrubbing), and edge compute for personalization/auth close to users. Cache strategy (TTLs, invalidation, cache keys) and the static/dynamic split are the design crux. Cons: cache invalidation is hard; dynamic/personalized content benefits less; consistency between edge and origin must be reasoned about.

11. Bufferbloat

Bufferbloat (Gettys & Nichols, 2011): oversized network buffers, combined with loss-based congestion control (which only backs off on loss), keep queues persistently full — so latency balloons under load even though throughput looks fine. Fixes: Active Queue Management (CoDel, which targets queue delay not occupancy) and model-based CC (BBR) that paces to the bottleneck instead of filling buffers. The lesson: a full buffer is latency you’re carrying on every packet.

12. Application-level backpressure and flow control

Beyond TCP’s flow control, services need application backpressure: when a component is overloaded, it must signal upstream to slow down rather than building unbounded queues (which inflate latency and eventually OOM). Mechanisms: bounded queues, blocking/credit-based flow control (gRPC/HTTP/2 stream windows, Reactive Streams), load shedding (reject early — return 429/503 fast rather than queueing), and concurrency limits (e.g. adaptive limits à la TCP Vegas applied to RPC). The anti-pattern is an unbounded in-memory queue that turns overload into a latency spiral and then a crash.

13. Decision guide

Transport choice:
├─ Many parallel objects, latency-sensitive web ─► HTTP/3 (QUIC) — no cross-stream HoL, 0–1 RTT
├─ Standard request/response, ubiquity needed ───► HTTP/2 over TCP (reuse connections!)
├─ Real-time, loss-tolerant (games/voice/video) ─► UDP/QUIC + app-level recovery
└─ Bulk reliable transfer ───────────────────────► TCP with big windows / parallel streams

Load balancing:
├─ Route by URL/host/cookie, TLS termination, retries ─► L7
├─ Raw throughput / pass-through / non-HTTP ──────────► L4
├─ Affinity to stateful nodes / caches ──────────────► CONSISTENT HASHING (+ virtual nodes)
└─ Spread load to stateless workers ─────────────────► POWER-OF-TWO-CHOICES

Latency hygiene:
   Short interactive RPC ► TCP_NODELAY (disable Nagle) + connection reuse
   High latency under load ► check BUFFERBLOAT → BBR / CoDel; add BACKPRESSURE + load shedding
   Global users ► CDN/edge + anycast to cut RTT

Reach-for / avoid:

  • QUIC/HTTP3for: many parallel streams, mobile (migration). Avoid when: UDP is blocked or CPU is the bottleneck.
  • L7 LBfor: content routing, TLS, retries. Avoid when: you only need cheap pass-through (use L4).
  • Consistent hashingfor: cache/stateful affinity. Avoid when: fully stateless (P2C is simpler/better-balanced).
  • Retriesfor: transient failures. Avoid when: unbounded (retry storms) — always budget + backoff + jitter.

PART 4 — INTERVIEW ARSENAL

How to wield this. Senior signals: (1) you reason in round trips and RTT, not bandwidth; (2) you name head-of-line blocking as the reason for HTTP/3 and bufferbloat/backpressure for latency-under-load; (3) you pick load-balancing strategy by stateful vs stateless and bound retries. Each question has a model answer and counter-ladder.

A. Fundamentals

Q1. Why doesn’t more bandwidth fix latency? Answer: Latency is propagation (speed of light) + queuing + per-round-trip protocol cost; bandwidth only affects transmission time of large payloads. A request that needs 3 round trips across an 80 ms link takes ~240 ms regardless of pipe size. Wins come from fewer round trips (connection reuse, 0-RTT), avoiding HoL blocking, and shorter queues — not a fatter pipe. Counter-ladder:

  • “When does bandwidth matter?” → Bulk transfers (throughput = window/RTT, or link rate); high-BDP links need big windows/parallelism.
  • “Speed-of-light floor US↔EU?” → ~5 µs/km in fiber; ~5500 km → ~27 ms one way, ~55+ ms RTT minimum.

Q2. Flow control vs congestion control? Answer: Flow control is receiver-driven (the advertised window stops a fast sender from overrunning a slow receiver). Congestion control is network-driven (slow start + AIMD stop senders from overrunning the shared network). Either can throttle you; they solve different problems. Counter-ladder:

  • “What limits throughput on a single TCP connection?” → window ÷ RTT (BDP); enlarge window or parallelize.
  • “Why does a new connection start slow?” → TCP slow start ramps the congestion window from small; reuse connections to avoid paying it repeatedly.

B. HoL & modern transport

Q3. HTTP/2 fixed HoL blocking — but did it? Answer: It fixed request-level HoL (HTTP/1.1’s one-request-per-connection / pipelining stalls) by multiplexing streams over one connection. But all streams share one TCP byte stream, so a single lost packet stalls every stream (transport-level HoL). QUIC/HTTP3 fixes this with independent per-stream loss recovery over UDP. Counter-ladder:

  • “Why can’t TCP just deliver streams independently?” → TCP is a single ordered byte stream by design; the OS can’t know your stream boundaries.
  • “QUIC’s other wins?” → 0–1 RTT setup (integrated TLS 1.3), connection migration via connection ID, pluggable user-space congestion control.
  • “QUIC downside?” → UDP throttling/blocking by middleboxes; higher CPU than kernel TCP.

Q4. A small request/response RPC shows ~40 ms latency on a fast LAN. Why? Answer: Classic Nagle + delayed ACK interaction: Nagle holds the small send waiting for an ACK; the peer’s delayed ACK waits ~40 ms to piggyback — a standoff. Fix: set TCP_NODELAY (disable Nagle) for interactive RPC. Also reuse the connection to avoid handshake/slow-start. Counter-ladder:

  • “Tradeoff of disabling Nagle?” → More small packets / overhead; fine for latency-sensitive interactive traffic.
  • “Other fixed latencies in RPC?” → Connection setup (1 RTT TCP + 1 RTT TLS) — amortize via pooling/keepalive/session resumption.

C. Load balancing & scaling

Q5. L4 vs L7 load balancing — when each, and the cost? Answer: L4 routes by connection/IP:port — cheap, fast, protocol-agnostic, good for raw throughput/pass-through. L7 parses requests to route by path/host/header/cookie, terminate TLS, retry, rate-limit, do canaries — more CPU but far more capable. Use L7 when you need content-based decisions; L4 when you just need to spread connections cheaply. Counter-ladder:

  • “How balance load to stateful cache nodes?” → Consistent hashing (with virtual nodes for evenness) so only ~1/N keys move on membership change.
  • “Best simple algorithm for stateless workers?” → Power-of-two-choices: pick 2 random, send to the less loaded — near-optimal with O(1) state, avoids herding on ‘the’ least-loaded.
  • “Why not always least-connections?” → Needs accurate global load state; can herd new requests onto one ‘idle’ server.

Q6. Design retries safely. Answer: Retries only for idempotent or idempotency-keyed operations; bound them with a retry budget (e.g. retries ≤ 10% of requests), exponential backoff with jitter, and per-try timeouts. Otherwise retries during an incident amplify load (retry storm) and turn a brownout into an outage. Combine with circuit breakers and load shedding. Counter-ladder:

  • “Why jitter?” → Avoid synchronized retry waves (thundering herd) hitting the recovering service simultaneously.
  • “Retry a non-idempotent write?” → Only with an idempotency key the server dedupes on; else risk double-apply.

D. Latency under load

Q7. Throughput looks fine but latency spikes under load. Diagnose. Answer: Likely bufferbloat — full buffers (in the network or app queues) add standing delay; loss-based CC keeps them full. Network fix: BBR (paces to bottleneck) and AQM like CoDel (targets queue delay). App fix: bounded queues + backpressure + load shedding instead of unbounded in-memory queues. Measure queue delay, not just occupancy/throughput. Counter-ladder:

  • “Why does an unbounded app queue make it worse?” → It absorbs overload into latency, then OOMs — a spiral; shed load early (429/503) instead.
  • “How does BBR differ from CUBIC?” → BBR models bandwidth+RTT and paces; CUBIC fills the buffer until loss — BBR keeps queues shorter.

E. Worked drill — driving a design end-to-end

Watch round-trip reasoning, HoL avoidance, and backpressure drive the design.

Prompt: “Design the edge/transport layer for a global interactive web app: users worldwide load a dashboard with ~50 small API/asset fetches, on flaky mobile networks; it must feel fast and stay stable under traffic spikes.”

1 — Attack round trips first. “Latency is dominated by RTT × number of round trips, and users are global, so propagation is large. Step one: a CDN with anycast so static assets and cacheable API responses are served from a PoP near the user — turning an 80–150 ms origin RTT into a few ms. Dynamic calls still hit regional backends, ideally via the edge (terminate TLS at the edge, reuse warm origin connections).”

2 — Choose the transport for 50 parallel fetches. “Fifty small parallel objects is the textbook head-of-line-blocking scenario. Over HTTP/2/TCP, one lost packet on a flaky mobile link stalls all 50 streams. So HTTP/3 (QUIC): independent per-stream loss recovery means one lost packet stalls only its stream; plus 0–1 RTT setup and connection migration (Wi-Fi→cellular without reconnect) — exactly right for mobile.”

3 — Amortize setup. “Reuse a single QUIC connection (multiplexed) for all 50 fetches — pay the handshake once. Enable 0-RTT resumption for repeat visits, but only send 0-RTT data for idempotent GETs (0-RTT is replayable). TLS 1.3 throughout.”

4 — Interactive latency hygiene. “For the dynamic RPC path, disable Nagle (TCP_NODELAY) where TCP is still used, keep connection pools warm to backends, and avoid chatty sequential calls — batch or parallelize so we don’t stack RTTs.”

5 — Stability under spikes. “Protect against traffic surges with backpressure and load shedding: bounded queues at each tier, return 429/503 fast rather than queueing unboundedly, and concurrency limits at the edge. Retries are budgeted (≤10% of requests) with exponential backoff + jitter to prevent retry storms during a brownout. Circuit breakers trip on a failing backend so the edge serves cached/degraded responses instead of piling on.”

6 — Load balancing. “At the edge, L7 load balancing to route by path/host, terminate TLS, and do retries/canaries. To stateful caches behind it, consistent hashing with virtual nodes for affinity; to stateless API workers, power-of-two-choices for even load with minimal state.”

7 — Tradeoffs stated. “I optimized for fewer round trips (CDN/anycast + connection reuse + 0-RTT) and no HoL blocking (QUIC) — the two things that actually move global interactive latency — and bought stability with backpressure + bounded retries rather than big buffers (which would cause bufferbloat). Costs: QUIC’s higher CPU and middlebox/UDP-blocking risk (fall back to HTTP/2), and CDN cache-invalidation complexity for dynamic content. If UDP were widely blocked for our users, I’d fall back to HTTP/2 + aggressive connection reuse and accept transport HoL.”

Template: cut round trips (CDN/anycast/reuse/0-RTT) → kill HoL (QUIC) → latency hygiene (Nagle, batching) → stability (backpressure, bounded retries) → pick LB by stateful/stateless → state the tradeoff.

F. Consolidated gotchas & traps (rapid fire)

  • Reduce round trips, not just bandwidth.
  • HTTP/2 over TCP still has transport HoL — QUIC fixes it.
  • Nagle + delayed ACK = ~40 ms stall; TCP_NODELAY for interactive RPC.
  • Slow start penalizes new connections — reuse them.
  • Bufferbloat = latency under load; BBR/CoDel + backpressure.
  • Flow control ≠ congestion control.
  • Unbounded retries → retry storms; budget + backoff + jitter.
  • Consistent hashing needs virtual nodes.
  • Single TCP connection caps at window/RTT (BDP).
  • 0-RTT data is replayable — idempotent requests only.

G. Pros/cons master tables

Transports

Transport Pros Cons
TCP Reliable, ubiquitous, kernel/NIC offload Per-connection HoL; handshake+slow-start; ossified; no migration
UDP Minimal overhead/latency; full control DIY reliability/CC; can be a bad citizen
QUIC/HTTP3 No cross-stream HoL; 0–1 RTT; migration; evolvable UDP blocking; higher CPU; newer

Load balancing

Strategy Pros Cons
L4 Fast, cheap, protocol-agnostic No content decisions
L7 Routing, TLS, retries, canaries More CPU
Round-robin Trivial Ignores load
Least-connections Load-aware Needs state; can herd
Consistent hashing Affinity; minimal key movement Needs virtual nodes for evenness
Power-of-two-choices Near-optimal, O(1) state Slightly worse than perfect knowledge

Go deeper (primary sources)

  • Cerf & Kahn, “A Protocol for Packet Network Intercommunication” (1974).
  • Jacobson, “Congestion Avoidance and Control” (1988); Allman, Paxson, Blanton, “TCP Congestion Control” (RFC 5681).
  • Ha, Rhee, Xu, “CUBIC: A New TCP-Friendly High-Speed TCP Variant” (2008).
  • Cardwell, Cheng, Gunn, Yeganeh, Jacobson, “BBR: Congestion-Based Congestion Control” (2016).
  • Iyengar & Thomson, “QUIC: A UDP-Based Multiplexed and Secure Transport” (RFC 9000, 2021).
  • Rescorla, “The Transport Layer Security (TLS) Protocol Version 1.3” (RFC 8446, 2018).
  • Nagle, “Congestion Control in IP/TCP Internetworks” (RFC 896, 1984).
  • Stevens, TCP/IP Illustrated, Volume 1.
  • Gettys & Nichols, “Bufferbloat: Dark Buffers in the Internet” (2011).
  • Dean & Barroso, “The Tail at Scale” (2013).

PART 5 — MAKE IT STICK (Teaching Tutorial)

The references are the map; this is the driving lesson. Networking clicks once you count in round trips instead of bytes, and once you can see head-of-line blocking. Two pictures do most of the work.

14.1 The one idea: count round trips, not megabytes

   Latency = propagation (speed of light, fixed) + queuing + per-RTT protocol cost
   A request needing 4 round trips on an 80 ms link ≈ 320 ms — NO pipe size fixes that.
   Wins come from: fewer round trips, no head-of-line blocking, shorter queues.

Bandwidth moves big payloads faster; it does nothing for a chatty request that waits N round trips. The latency lever is reducing round trips (reuse connections, 0-RTT, CDNs), not buying a fatter pipe.

14.2 Head-of-line blocking — the picture that explains HTTP/3

  HTTP/2 over ONE TCP stream:        HTTP/3 over QUIC (independent streams):
   [r1][r2][r3][r4] one byte order     r1 ──●──── (lost pkt stalls only r1)
        ✗ lost packet                   r2 ──────── flows
   → ALL of r2,r3,r4 STALL behind it    r3 ──────── flows
   (transport-level HoL blocking)       r4 ──────── flows

TCP is one ordered byte stream, so one lost packet stalls everything behind it — even unrelated requests multiplexed on top (HTTP/2). QUIC runs many independent streams over UDP, so a loss stalls only its own stream. That, plus 0–1 RTT setup and connection migration, is the whole case for HTTP/3.

14.3 Congestion vs flow control (two different brakes)

  FLOW control  = "don't overrun the RECEIVER" (receiver's advertised window)
  CONGESTION control = "don't overrun the NETWORK" (slow start + AIMD sawtooth)

   cwnd  ╱╲    ╱╲    ╱╲     ← additive increase, then halve on loss (AIMD)
         ╱  ╲╱  ╲╱  ╲
        ╱  slow start ramps a NEW connection up from small → reuse connections!

Two independent brakes can each throttle you. New connections start slow (slow start), which is why connection reuse is a huge win for short requests.

14.4 The 40 ms mystery (Nagle + delayed ACK)

  Sender (Nagle): "I'll wait for an ACK before sending this small chunk."
  Receiver (delayed ACK): "I'll wait ~40 ms to piggyback this ACK."
  → standoff → ~40 ms stall on a fast LAN.   Fix: TCP_NODELAY for interactive RPC.

14.5 Analogies that stick

  • RTT = a question across a canyon — shouting faster (bandwidth) doesn’t shorten the echo time; asking fewer times does.
  • TCP HoL = a single-file checkout line — one slow shopper blocks everyone. QUIC = parallel lines.
  • Slow start = merging onto a highway — you accelerate gradually, not floor it.
  • Consistent hashing = assigning seats by a ring so adding a row only moves a few people; power-of-two-choices = glance at two checkout lines, pick the shorter.

14.6 Misconceptions → corrections

You might think… Actually…
“More bandwidth = lower latency.” No — latency is round trips + propagation; reduce round trips.
“HTTP/2 killed head-of-line blocking.” Only request-level; TCP still has transport HoL. QUIC fixes it.
“Flow control and congestion control are the same.” Receiver-limit vs network-limit — different brakes.
“Retries are free safety.” Unbounded retries cause retry storms; budget + backoff + jitter.
“Consistent hashing balances evenly by itself.” Needs virtual nodes.

14.7 Explain it back (Feynman)

  1. Why doesn’t bandwidth fix a chatty request? [14.1]
  2. Draw HTTP/2 vs QUIC HoL blocking. [14.2]
  3. Flow vs congestion control — who’s protecting whom? [14.3]
  4. Explain the 40 ms stall and its fix. [14.4]
  5. L4 vs L7 — when each? [Part 3 §8]

14.8 Flashcards (cover the right column)

Prompt Answer
Latency lever Fewer round trips (not bandwidth)
HTTP/2 weakness Transport HoL (one TCP stream)
QUIC fix Independent per-stream loss recovery
Flow control Don’t overrun the receiver
Congestion control Don’t overrun the network (AIMD)
New-connection cost Slow start → reuse connections
40 ms stall Nagle + delayed ACK → TCP_NODELAY
Stateful affinity LB Consistent hashing (+ virtual nodes)
Stateless LB Power-of-two-choices

14.9 The 60-second recall

“Network latency is round trips plus propagation plus queuing — bandwidth only speeds big payloads, so the lever is fewer round trips (connection reuse, 0-RTT, CDNs near users). TCP is a single ordered byte stream, so one lost packet causes head-of-line blocking that stalls everything behind it, including all HTTP/2 streams; QUIC fixes this with independent per-stream recovery over UDP, plus 0–1 RTT setup and connection migration. Flow control protects the receiver; congestion control (slow start, AIMD) protects the network — and slow start penalizes new connections, so reuse them. Disable Nagle (TCP_NODELAY) for interactive RPC to avoid the 40 ms delayed-ACK stall. Balance stateful traffic with consistent hashing (plus virtual nodes), stateless with power-of-two-choices, and always bound retries with budgets, backoff, and jitter.”

Frequently asked questions

Why doesn’t more bandwidth fix latency?

Latency is dominated by propagation (speed of light), queuing, and per-round-trip protocol cost; bandwidth only affects the transmission time of large payloads. A request needing several round trips across a high-latency link is slow regardless of pipe size. Wins come from fewer round trips, avoiding head-of-line blocking, and shorter queues.

What is head-of-line blocking and how does QUIC fix it?

Head-of-line blocking is when one stalled item blocks everything behind it. HTTP/2 multiplexes streams over one TCP byte stream, so a single lost packet stalls all streams. QUIC runs over UDP with independent per-stream loss recovery, so a lost packet only stalls its own stream.

What’s the difference between flow control and congestion control?

Flow control is receiver-driven: the advertised window stops a fast sender from overrunning a slow receiver. Congestion control is network-driven: slow start and additive-increase/multiplicative-decrease stop senders from overrunning the shared network. Either can limit throughput.

L4 vs L7 load balancing — when do you use each?

L4 balances by connection and IP:port — cheap, fast, and protocol-agnostic, good for raw throughput. L7 parses requests to route by path, host, header, or cookie, terminate TLS, retry, and do canaries — more CPU but far more capable. Use L7 for content-based decisions, L4 for cheap pass-through.

What causes about 40ms of latency on small TCP requests?

The interaction of Nagle’s algorithm and delayed ACK: Nagle holds a small send waiting for an acknowledgment while the peer delays the ACK to piggyback, creating a standoff. Disable Nagle with TCP_NODELAY for interactive RPC, and reuse connections to avoid handshake and slow-start cost.

Previous