Networking & the Data Plane: TCP, QUIC & Load Balancing Reference

By Asif·June 17, 2026·24 min read·System Design

A self-contained reference for how bytes actually move and why your p99 is what it is. Most engineers treat the network as a magic pipe; the ones who can reason about round trips, congestion control, head-of-line blocking, and load-balancing math design systems that stay fast under load instead of mysteriously stalling. This chapter is that reasoning, end to end.

How to use this: Part 1 is the reference card. Part 2 maps the territory. Part 3 is the full depth with pros/cons per mechanism. Part 4 is exhaustive interview prep with counter-question ladders.

Key takeaways

Network performance is governed by round trips and congestion control, not bandwidth alone — reduce round trips and keep queues short.
HTTP/2 over a single TCP connection still suffers transport head-of-line blocking; QUIC/HTTP3 fixes it with independent per-stream loss recovery.
Flow control is receiver-driven; congestion control is network-driven — both can throttle you.
Bound retries with budgets, backoff, and jitter, or they amplify outages into retry storms.

PART 1 — CHEATSHEET (Reference Card)

Every concept in this document, condensed.

The one idea

Network performance is governed by round trips (RTT) and congestion control, not bandwidth alone. Latency = (propagation + transmission + queuing + processing); you cannot beat the speed of light, so the wins come from fewer round trips, avoiding head-of-line blocking, and keeping queues short.

Core vocabulary (one line each)

OSI / TCP-IP layers — abstraction stack: link → internet (IP) → transport (TCP/UDP) → application.
RTT — round-trip time; the fundamental latency unit; many protocols cost N×RTT.
TCP handshake — 3-way (SYN, SYN-ACK, ACK) = 1 RTT before data.
Flow control — receiver-driven (sliding window) — don’t overrun the receiver.
Congestion control — network-driven — don’t overrun the network (slow start, AIMD).
AIMD — additive-increase/multiplicative-decrease; TCP’s fairness/stability engine.
CUBIC / BBR — modern congestion control: loss-based (CUBIC) vs model-based (BBR, rate/RTT).
Nagle / delayed ACK — coalesce small sends / delay ACKs; together can add latency.
Head-of-line (HoL) blocking — one stalled item blocks others behind it (TCP byte stream, HTTP/2 over TCP).
UDP — connectionless, unreliable datagrams; you build reliability yourself.
QUIC / HTTP/3 — reliable multiplexed transport over UDP; per-stream (no cross-stream HoL); 0–1 RTT.
TLS handshake — TLS 1.3 = 1 RTT (0-RTT resumption); earlier = 2 RTT.
L4 vs L7 load balancing — by connection/IP:port vs by request content (HTTP path, headers).
Consistent hashing / power-of-two-choices — stable key→node mapping / pick less-loaded of 2 random.
Anycast — one IP announced from many locations; routed to nearest.
CDN / edge — cache/compute near users to cut RTT and origin load.
Bufferbloat — oversized queues inflate latency under load.
Backpressure — signal upstream to slow down when overloaded.

Transport comparison

	TCP	UDP	QUIC (HTTP/3)
Reliable/ordered	Yes (single stream)	No	Yes, per stream
Connection setup	1 RTT (+TLS)	0	0–1 RTT (TLS built in)
HoL blocking	Yes (stream-wide)	N/A	No across streams
Congestion control	Kernel (CUBIC/BBR)	DIY	User-space, pluggable
Connection migration	No (5-tuple)	DIY	Yes (connection ID)

L4 vs L7 load balancing

	L4 (transport)	L7 (application)
Decides on	IP:port, connection	HTTP path/host/headers/cookies
Cost	Cheap, fast	More CPU (parses requests)
Features	NAT, pass-through	Routing, TLS termination, retries, sticky sessions

Numbers to memorize

Same-DC RTT ~0.5 ms; same-region ~1–2 ms; cross-US ~30–40 ms; US↔EU ~70–90 ms; US↔Asia ~120–180 ms.
TCP connect = 1 RTT; +TLS 1.3 = +1 RTT (or 0-RTT resume); QUIC first connect ~1 RTT incl. crypto, resume 0-RTT.
Typical MSS ~1460 bytes (1500 MTU − headers).
Speed of light in fiber ≈ ~5 µs/km → ~200 km per ms one way.

Quick decision rules

Latency-sensitive web with many parallel objects → HTTP/3 (QUIC) to kill HoL blocking.
Many short connections → reuse connections (pooling/keepalive) to amortize handshakes.
Real-time/lossy-tolerant (games, voice, video) → UDP/QUIC with app-level recovery.
Route by URL/host/cookie, terminate TLS, retries → L7 LB; raw throughput/pass-through → L4 LB.
Spread keys to stateful nodes stably → consistent hashing; spread load to stateless → power-of-two-choices.

Top gotchas (litmus tests)

Bandwidth doesn’t fix latency — a fat pipe still costs 1 RTT per round trip; reduce round trips.
HTTP/2 multiplexes over one TCP connection → TCP HoL blocking; one lost packet stalls all streams. QUIC fixes this.
Nagle + delayed ACK can add ~40 ms to small request/response — disable Nagle (TCP_NODELAY) for interactive RPC.
TCP slow start means new connections start slow — connection reuse matters a lot for short transfers.
Bufferbloat: big buffers + loss-based CC = high latency under load; BBR/AQM (CoDel) help.
Flow control ≠ congestion control — receiver window vs network capacity; both can throttle you.
L7 retries can amplify outages (retry storms) — use budgets + backoff + jitter.
Consistent hashing needs virtual nodes or load is uneven.
A single global TCP connection caps throughput at window/RTT (BDP); parallelism or bigger windows needed for high BDP links.
Connection setup dominates short requests — measure handshakes, not just transfer time.

PART 2 — OUTLINE (full map)

Layering and how a packet moves end-to-end
TCP: handshake, flow control, congestion control, Nagle/delayed ACK
Head-of-line blocking (TCP and HTTP/2)
UDP
QUIC and HTTP/3
TLS handshake cost
RPC cost models and patterns
Load balancing: L4 vs L7 and the algorithms
DNS and anycast
CDNs and edge
Bufferbloat
Application-level backpressure and flow control
Decision guide
Make it stick — the teaching tutorial (round-trip & HoL pictures, mnemonics, flashcards)

PART 3 — DEEP DIVE

1. Layering and how a packet moves

The TCP/IP stack (Cerf & Kahn’s internetworking design, 1974): the link layer moves frames on a local network; the internet (IP) layer routes packets across networks (best-effort, unreliable, unordered); the transport layer (TCP/UDP) provides端-to-end semantics (reliability/ordering or not); the application layer (HTTP, gRPC) gives meaning. A byte you send is wrapped with headers at each layer (encapsulation), routed hop-by-hop by IP (each router forwards toward the destination), and unwrapped on the other side. Key consequence: IP gives no guarantees — every reliability/ordering property above it is built by TCP or your app, and each adds round trips and state.

Latency components: propagation (distance ÷ speed of light in fiber, ~5 µs/km), transmission (bytes ÷ link rate), queuing (time in router/NIC buffers — the variable, load-dependent part), and processing. Only queuing is really under your control at scale — which is why congestion control and queue management dominate tail latency.

2. TCP: handshake, flow control, congestion control

Three-way handshake: SYN → SYN-ACK → ACK establishes a connection in 1 RTT before any data flows (then +1 RTT for TLS unless resumed). Closing uses FIN/ACK exchanges.

Flow control (receiver-driven): the receiver advertises a window (how much it can buffer); the sender never has more than a window of unacknowledged data in flight. Prevents a fast sender from overwhelming a slow receiver. Max throughput ≈ window ÷ RTT (the bandwidth-delay product, BDP) — on high-BDP links you need large windows (window scaling) or parallelism.

Congestion control (network-driven) keeps the network from collapsing (Jacobson, 1988):

Slow start: start with a small congestion window, double each RTT (exponential) until a threshold or loss — so new connections ramp up, they don’t start fast.
Congestion avoidance / AIMD: additive increase (grow window by ~1 MSS/RTT) until loss, then multiplicative decrease (halve) — the sawtooth that gives stability and fairness.
Fast retransmit/recovery: triple-duplicate ACKs trigger retransmit without waiting for a timeout (TCP Reno).
CUBIC (Ha, Rhee, Xu, 2008): loss-based but uses a cubic growth function of time since last loss — better utilization on high-BDP networks; the Linux default.
BBR (Cardwell et al., 2016): model-based — estimates bottleneck bandwidth and RTT and paces to that, rather than treating loss as the only congestion signal — much better in the presence of shallow buffers/random loss and against bufferbloat.

Nagle’s algorithm + delayed ACK: Nagle (RFC 896) coalesces small writes to avoid tiny packets; delayed ACK holds ACKs briefly to piggyback. Together they can cause a ~40 ms stall for small request/response patterns — so interactive RPC sets TCP_NODELAY (disable Nagle).

Pros of TCP: reliable, ordered, congestion-friendly, ubiquitous, offloaded in kernel/NIC. Cons: per-connection HoL blocking, handshake + slow-start cost for short flows, ossified (hard to evolve), no connection migration across IP changes.

3. Head-of-line blocking (TCP and HTTP/2)

HoL blocking: when items must be delivered in order, one stalled item blocks everything behind it. TCP delivers a single ordered byte stream, so a lost segment stalls delivery of all later bytes until it’s retransmitted — even if those later bytes already arrived. HTTP/2 multiplexes many logical streams over one TCP connection; this fixed HTTP/1.1’s request-level HoL, but a single TCP packet loss now stalls all HTTP/2 streams (transport-level HoL). This is the core motivation for QUIC.

4. UDP

UDP is a thin layer over IP: connectionless, unreliable, unordered datagrams, no congestion control. You get low overhead and full control; you must build whatever reliability/ordering/flow control you need. Used by DNS, real-time media (with app-level recovery/FEC), and as the substrate for QUIC.

Pros: minimal latency/overhead, no HoL (each datagram independent), multicast, full control. Cons: you reimplement reliability/congestion control (and risk doing it badly — being a poor network citizen); no built-in security/ordering.

5. QUIC and HTTP/3

QUIC (RFC 9000, Iyengar & Thomson, 2021) is a reliable, multiplexed, secure transport built on top of UDP in user space. It fixes TCP’s structural problems:

No cross-stream HoL blocking: independent streams have independent loss recovery — a lost packet only stalls its own stream.
Fast setup: crypto (TLS 1.3) is integrated, so a connection + encryption is 1 RTT, and 0-RTT on resumption.
Connection migration: connections are identified by a connection ID, not the IP:port 5-tuple, so a client can change networks (Wi-Fi→cellular) without dropping the connection.
User-space + pluggable congestion control: evolves without kernel/OS upgrades (avoiding TCP’s ossification).

HTTP/3 is HTTP over QUIC.

Pros: eliminates transport HoL, faster setup, migration, evolvable. Cons: UDP is sometimes throttled/blocked by middleboxes; more CPU (user-space crypto/processing) than kernel TCP; newer, less ubiquitous tooling.

6. TLS handshake cost

TLS provides confidentiality/integrity/authentication but adds round trips. TLS 1.3 (RFC 8446, Rescorla, 2018) cut the handshake to 1 RTT (down from 2 in TLS 1.2) and supports 0-RTT resumption (send app data with the first message using a pre-shared key) — at the cost that 0-RTT data is replayable, so it must be used only for idempotent requests. With QUIC, TLS is folded into the transport handshake. Practical guidance: reuse connections (pooling, keepalive, session resumption) so you pay the handshake once, not per request.

7. RPC cost models and patterns

A remote call costs at minimum: connection setup (amortizable), 1 RTT for the request/response, serialization, and server processing. Patterns that matter:

Connection reuse / pooling / keepalive: amortize handshake + slow-start across many calls — often the single biggest latency win.
Multiplexing (HTTP/2, gRPC): many concurrent calls over one connection (mind transport HoL → HTTP/3 for many parallel streams).
Batching / pipelining: combine calls to amortize per-call overhead; pipeline to hide RTT.
Streaming: gRPC streams for continuous data instead of repeated unary calls.
Timeouts + retries with budgets/backoff/jitter: essential, but uncontrolled retries cause retry storms that amplify outages — bound them.

Pros: clean abstraction, multiplexed/streamed efficiency. Cons: the network is not free or reliable (Deutsch’s fallacies of distributed computing); every RPC adds latency, partial-failure modes, and tail-latency exposure.

8. Load balancing: L4 vs L7 and the algorithms

L4 (transport): balances by connection/IP:port, often via NAT or direct server return; cheap, high throughput, protocol-agnostic; can’t make content decisions.
L7 (application): parses requests (HTTP path, host, headers, cookies) to route, terminate TLS, retry, rate-limit, do sticky sessions, canary. More CPU; far more capable.

Algorithms:

Round-robin / weighted RR: simple; ignores actual load.
Least-connections / least-load: send to the least busy; better under heterogeneous request costs.
Consistent hashing: map keys (or clients) to nodes on a ring so that adding/removing a node moves only ~1/N of keys — essential for stateful/cache affinity. Use virtual nodes to even out load.
Power-of-two-choices (P2C): pick two servers at random, send to the less loaded — provably near-optimal load distribution with O(1) state, avoiding the herd behavior of “always pick the least loaded.”

Pros/cons: L4 = speed, L7 = intelligence; RR = simplicity, least-conn = load-awareness (needs state), consistent hashing = affinity (needs vnodes), P2C = great balance with tiny state.

9. DNS and anycast

DNS resolves names to addresses hierarchically (root → TLD → authoritative), cached with TTLs. It’s also a coarse load-balancing/traffic-steering tool (return different IPs by geo/health). Anycast announces the same IP from many locations; BGP routes each client to the topologically nearest instance — used by DNS resolvers, CDNs, and DDoS absorption. Caveat: anycast routing can shift mid-session (fine for stateless/UDP request-response, trickier for long-lived TCP).

10. CDNs and edge

A CDN caches content (and increasingly runs compute) at points of presence near users, cutting RTT (the dominant latency) and offloading origin. Benefits: lower latency (shorter propagation), origin protection (cache absorbs reads, DDoS scrubbing), and edge compute for personalization/auth close to users. Cache strategy (TTLs, invalidation, cache keys) and the static/dynamic split are the design crux. Cons: cache invalidation is hard; dynamic/personalized content benefits less; consistency between edge and origin must be reasoned about.

11. Bufferbloat

Bufferbloat (Gettys & Nichols, 2011): oversized network buffers, combined with loss-based congestion control (which only backs off on loss), keep queues persistently full — so latency balloons under load even though throughput looks fine. Fixes: Active Queue Management (CoDel, which targets queue delay not occupancy) and model-based CC (BBR) that paces to the bottleneck instead of filling buffers. The lesson: a full buffer is latency you’re carrying on every packet.

12. Application-level backpressure and flow control

Beyond TCP’s flow control, services need application backpressure: when a component is overloaded, it must signal upstream to slow down rather than building unbounded queues (which inflate latency and eventually OOM). Mechanisms: bounded queues, blocking/credit-based flow control (gRPC/HTTP/2 stream windows, Reactive Streams), load shedding (reject early — return 429/503 fast rather than queueing), and concurrency limits (e.g. adaptive limits à la TCP Vegas applied to RPC). The anti-pattern is an unbounded in-memory queue that turns overload into a latency spiral and then a crash.

13. Decision guide

Transport choice:
├─ Many parallel objects, latency-sensitive web ─► HTTP/3 (QUIC) — no cross-stream HoL, 0–1 RTT
├─ Standard request/response, ubiquity needed ───► HTTP/2 over TCP (reuse connections!)
├─ Real-time, loss-tolerant (games/voice/video) ─► UDP/QUIC + app-level recovery
└─ Bulk reliable transfer ───────────────────────► TCP with big windows / parallel streams

Load balancing:
├─ Route by URL/host/cookie, TLS termination, retries ─► L7
├─ Raw throughput / pass-through / non-HTTP ──────────► L4
├─ Affinity to stateful nodes / caches ──────────────► CONSISTENT HASHING (+ virtual nodes)
└─ Spread load to stateless workers ─────────────────► POWER-OF-TWO-CHOICES

Latency hygiene:
   Short interactive RPC ► TCP_NODELAY (disable Nagle) + connection reuse
   High latency under load ► check BUFFERBLOAT → BBR / CoDel; add BACKPRESSURE + load shedding
   Global users ► CDN/edge + anycast to cut RTT

Reach-for / avoid:

QUIC/HTTP3 — for: many parallel streams, mobile (migration). Avoid when: UDP is blocked or CPU is the bottleneck.
L7 LB — for: content routing, TLS, retries. Avoid when: you only need cheap pass-through (use L4).
Consistent hashing — for: cache/stateful affinity. Avoid when: fully stateless (P2C is simpler/better-balanced).
Retries — for: transient failures. Avoid when: unbounded (retry storms) — always budget + backoff + jitter.

PART 4 — INTERVIEW ARSENAL

How to wield this. Senior signals: (1) you reason in round trips and RTT, not bandwidth; (2) you name head-of-line blocking as the reason for HTTP/3 and bufferbloat/backpressure for latency-under-load; (3) you pick load-balancing strategy by stateful vs stateless and bound retries. Each question has a model answer and counter-ladder.

A. Fundamentals

Q1. Why doesn’t more bandwidth fix latency? Answer: Latency is propagation (speed of light) + queuing + per-round-trip protocol cost; bandwidth only affects transmission time of large payloads. A request that needs 3 round trips across an 80 ms link takes ~240 ms regardless of pipe size. Wins come from fewer round trips (connection reuse, 0-RTT), avoiding HoL blocking, and shorter queues — not a fatter pipe. Counter-ladder:

“When does bandwidth matter?” → Bulk transfers (throughput = window/RTT, or link rate); high-BDP links need big windows/parallelism.
“Speed-of-light floor US↔EU?” → ~5 µs/km in fiber; ~5500 km → ~27 ms one way, ~55+ ms RTT minimum.

Q2. Flow control vs congestion control? Answer: Flow control is receiver-driven (the advertised window stops a fast sender from overrunning a slow receiver). Congestion control is network-driven (slow start + AIMD stop senders from overrunning the shared network). Either can throttle you; they solve different problems. Counter-ladder:

“What limits throughput on a single TCP connection?” → window ÷ RTT (BDP); enlarge window or parallelize.
“Why does a new connection start slow?” → TCP slow start ramps the congestion window from small; reuse connections to avoid paying it repeatedly.

B. HoL & modern transport

Q3. HTTP/2 fixed HoL blocking — but did it? Answer: It fixed request-level HoL (HTTP/1.1’s one-request-per-connection / pipelining stalls) by multiplexing streams over one connection. But all streams share one TCP byte stream, so a single lost packet stalls every stream (transport-level HoL). QUIC/HTTP3 fixes this with independent per-stream loss recovery over UDP. Counter-ladder:

“Why can’t TCP just deliver streams independently?” → TCP is a single ordered byte stream by design; the OS can’t know your stream boundaries.
“QUIC’s other wins?” → 0–1 RTT setup (integrated TLS 1.3), connection migration via connection ID, pluggable user-space congestion control.
“QUIC downside?” → UDP throttling/blocking by middleboxes; higher CPU than kernel TCP.

Q4. A small request/response RPC shows ~40 ms latency on a fast LAN. Why? Answer: Classic Nagle + delayed ACK interaction: Nagle holds the small send waiting for an ACK; the peer’s delayed ACK waits ~40 ms to piggyback — a standoff. Fix: set TCP_NODELAY (disable Nagle) for interactive RPC. Also reuse the connection to avoid handshake/slow-start. Counter-ladder:

“Tradeoff of disabling Nagle?” → More small packets / overhead; fine for latency-sensitive interactive traffic.
“Other fixed latencies in RPC?” → Connection setup (1 RTT TCP + 1 RTT TLS) — amortize via pooling/keepalive/session resumption.

C. Load balancing & scaling

Q5. L4 vs L7 load balancing — when each, and the cost? Answer: L4 routes by connection/IP:port — cheap, fast, protocol-agnostic, good for raw throughput/pass-through. L7 parses requests to route by path/host/header/cookie, terminate TLS, retry, rate-limit, do canaries — more CPU but far more capable. Use L7 when you need content-based decisions; L4 when you just need to spread connections cheaply. Counter-ladder:

“How balance load to stateful cache nodes?” → Consistent hashing (with virtual nodes for evenness) so only ~1/N keys move on membership change.
“Best simple algorithm for stateless workers?” → Power-of-two-choices: pick 2 random, send to the less loaded — near-optimal with O(1) state, avoids herding on ‘the’ least-loaded.
“Why not always least-connections?” → Needs accurate global load state; can herd new requests onto one ‘idle’ server.

Q6. Design retries safely. Answer: Retries only for idempotent or idempotency-keyed operations; bound them with a retry budget (e.g. retries ≤ 10% of requests), exponential backoff with jitter, and per-try timeouts. Otherwise retries during an incident amplify load (retry storm) and turn a brownout into an outage. Combine with circuit breakers and load shedding. Counter-ladder:

“Why jitter?” → Avoid synchronized retry waves (thundering herd) hitting the recovering service simultaneously.
“Retry a non-idempotent write?” → Only with an idempotency key the server dedupes on; else risk double-apply.

D. Latency under load

Q7. Throughput looks fine but latency spikes under load. Diagnose. Answer: Likely bufferbloat — full buffers (in the network or app queues) add standing delay; loss-based CC keeps them full. Network fix: BBR (paces to bottleneck) and AQM like CoDel (targets queue delay). App fix: bounded queues + backpressure + load shedding instead of unbounded in-memory queues. Measure queue delay, not just occupancy/throughput. Counter-ladder:

“Why does an unbounded app queue make it worse?” → It absorbs overload into latency, then OOMs — a spiral; shed load early (429/503) instead.
“How does BBR differ from CUBIC?” → BBR models bandwidth+RTT and paces; CUBIC fills the buffer until loss — BBR keeps queues shorter.

E. Worked drill — driving a design end-to-end

Watch round-trip reasoning, HoL avoidance, and backpressure drive the design.

Prompt: “Design the edge/transport layer for a global interactive web app: users worldwide load a dashboard with ~50 small API/asset fetches, on flaky mobile networks; it must feel fast and stay stable under traffic spikes.”

1 — Attack round trips first. “Latency is dominated by RTT × number of round trips, and users are global, so propagation is large. Step one: a CDN with anycast so static assets and cacheable API responses are served from a PoP near the user — turning an 80–150 ms origin RTT into a few ms. Dynamic calls still hit regional backends, ideally via the edge (terminate TLS at the edge, reuse warm origin connections).”

2 — Choose the transport for 50 parallel fetches. “Fifty small parallel objects is the textbook head-of-line-blocking scenario. Over HTTP/2/TCP, one lost packet on a flaky mobile link stalls all 50 streams. So HTTP/3 (QUIC): independent per-stream loss recovery means one lost packet stalls only its stream; plus 0–1 RTT setup and connection migration (Wi-Fi→cellular without reconnect) — exactly right for mobile.”

3 — Amortize setup. “Reuse a single QUIC connection (multiplexed) for all 50 fetches — pay the handshake once. Enable 0-RTT resumption for repeat visits, but only send 0-RTT data for idempotent GETs (0-RTT is replayable). TLS 1.3 throughout.”

4 — Interactive latency hygiene. “For the dynamic RPC path, disable Nagle (TCP_NODELAY) where TCP is still used, keep connection pools warm to backends, and avoid chatty sequential calls — batch or parallelize so we don’t stack RTTs.”

5 — Stability under spikes. “Protect against traffic surges with backpressure and load shedding: bounded queues at each tier, return 429/503 fast rather than queueing unboundedly, and concurrency limits at the edge. Retries are budgeted (≤10% of requests) with exponential backoff + jitter to prevent retry storms during a brownout. Circuit breakers trip on a failing backend so the edge serves cached/degraded responses instead of piling on.”

6 — Load balancing. “At the edge, L7 load balancing to route by path/host, terminate TLS, and do retries/canaries. To stateful caches behind it, consistent hashing with virtual nodes for affinity; to stateless API workers, power-of-two-choices for even load with minimal state.”

7 — Tradeoffs stated. “I optimized for fewer round trips (CDN/anycast + connection reuse + 0-RTT) and no HoL blocking (QUIC) — the two things that actually move global interactive latency — and bought stability with backpressure + bounded retries rather than big buffers (which would cause bufferbloat). Costs: QUIC’s higher CPU and middlebox/UDP-blocking risk (fall back to HTTP/2), and CDN cache-invalidation complexity for dynamic content. If UDP were widely blocked for our users, I’d fall back to HTTP/2 + aggressive connection reuse and accept transport HoL.”

Template: cut round trips (CDN/anycast/reuse/0-RTT) → kill HoL (QUIC) → latency hygiene (Nagle, batching) → stability (backpressure, bounded retries) → pick LB by stateful/stateless → state the tradeoff.

F. Consolidated gotchas & traps (rapid fire)

Reduce round trips, not just bandwidth.
HTTP/2 over TCP still has transport HoL — QUIC fixes it.
Nagle + delayed ACK = ~40 ms stall; TCP_NODELAY for interactive RPC.
Slow start penalizes new connections — reuse them.
Bufferbloat = latency under load; BBR/CoDel + backpressure.
Flow control ≠ congestion control.
Unbounded retries → retry storms; budget + backoff + jitter.
Consistent hashing needs virtual nodes.
Single TCP connection caps at window/RTT (BDP).
0-RTT data is replayable — idempotent requests only.

G. Pros/cons master tables

Transports

Transport	Pros	Cons
TCP	Reliable, ubiquitous, kernel/NIC offload	Per-connection HoL; handshake+slow-start; ossified; no migration
UDP	Minimal overhead/latency; full control	DIY reliability/CC; can be a bad citizen
QUIC/HTTP3	No cross-stream HoL; 0–1 RTT; migration; evolvable	UDP blocking; higher CPU; newer

Load balancing

Strategy	Pros	Cons
L4	Fast, cheap, protocol-agnostic	No content decisions
L7	Routing, TLS, retries, canaries	More CPU
Round-robin	Trivial	Ignores load
Least-connections	Load-aware	Needs state; can herd
Consistent hashing	Affinity; minimal key movement	Needs virtual nodes for evenness
Power-of-two-choices	Near-optimal, O(1) state	Slightly worse than perfect knowledge

Go deeper (primary sources)

Cerf & Kahn, “A Protocol for Packet Network Intercommunication” (1974).
Jacobson, “Congestion Avoidance and Control” (1988); Allman, Paxson, Blanton, “TCP Congestion Control” (RFC 5681).
Ha, Rhee, Xu, “CUBIC: A New TCP-Friendly High-Speed TCP Variant” (2008).
Cardwell, Cheng, Gunn, Yeganeh, Jacobson, “BBR: Congestion-Based Congestion Control” (2016).
Iyengar & Thomson, “QUIC: A UDP-Based Multiplexed and Secure Transport” (RFC 9000, 2021).
Rescorla, “The Transport Layer Security (TLS) Protocol Version 1.3” (RFC 8446, 2018).
Nagle, “Congestion Control in IP/TCP Internetworks” (RFC 896, 1984).
Stevens, TCP/IP Illustrated, Volume 1.
Gettys & Nichols, “Bufferbloat: Dark Buffers in the Internet” (2011).
Dean & Barroso, “The Tail at Scale” (2013).

PART 5 — MAKE IT STICK (Teaching Tutorial)

The references are the map; this is the driving lesson. Networking clicks once you count in round trips instead of bytes, and once you can see head-of-line blocking. Two pictures do most of the work.

14.1 The one idea: count round trips, not megabytes

   Latency = propagation (speed of light, fixed) + queuing + per-RTT protocol cost
   A request needing 4 round trips on an 80 ms link ≈ 320 ms — NO pipe size fixes that.
   Wins come from: fewer round trips, no head-of-line blocking, shorter queues.

Bandwidth moves big payloads faster; it does nothing for a chatty request that waits N round trips. The latency lever is reducing round trips (reuse connections, 0-RTT, CDNs), not buying a fatter pipe.

14.2 Head-of-line blocking — the picture that explains HTTP/3

  HTTP/2 over ONE TCP stream:        HTTP/3 over QUIC (independent streams):
   [r1][r2][r3][r4] one byte order     r1 ──●──── (lost pkt stalls only r1)
        ✗ lost packet                   r2 ──────── flows
   → ALL of r2,r3,r4 STALL behind it    r3 ──────── flows
   (transport-level HoL blocking)       r4 ──────── flows

TCP is one ordered byte stream, so one lost packet stalls everything behind it — even unrelated requests multiplexed on top (HTTP/2). QUIC runs many independent streams over UDP, so a loss stalls only its own stream. That, plus 0–1 RTT setup and connection migration, is the whole case for HTTP/3.

14.3 Congestion vs flow control (two different brakes)

  FLOW control  = "don't overrun the RECEIVER" (receiver's advertised window)
  CONGESTION control = "don't overrun the NETWORK" (slow start + AIMD sawtooth)

   cwnd  ╱╲    ╱╲    ╱╲     ← additive increase, then halve on loss (AIMD)
         ╱  ╲╱  ╲╱  ╲
        ╱  slow start ramps a NEW connection up from small → reuse connections!

Two independent brakes can each throttle you. New connections start slow (slow start), which is why connection reuse is a huge win for short requests.

14.4 The 40 ms mystery (Nagle + delayed ACK)

  Sender (Nagle): "I'll wait for an ACK before sending this small chunk."
  Receiver (delayed ACK): "I'll wait ~40 ms to piggyback this ACK."
  → standoff → ~40 ms stall on a fast LAN.   Fix: TCP_NODELAY for interactive RPC.

14.5 Analogies that stick

RTT = a question across a canyon — shouting faster (bandwidth) doesn’t shorten the echo time; asking fewer times does.
TCP HoL = a single-file checkout line — one slow shopper blocks everyone. QUIC = parallel lines.
Slow start = merging onto a highway — you accelerate gradually, not floor it.
Consistent hashing = assigning seats by a ring so adding a row only moves a few people; power-of-two-choices = glance at two checkout lines, pick the shorter.

14.6 Misconceptions → corrections

You might think…	Actually…
“More bandwidth = lower latency.”	No — latency is round trips + propagation; reduce round trips.
“HTTP/2 killed head-of-line blocking.”	Only request-level; TCP still has transport HoL. QUIC fixes it.
“Flow control and congestion control are the same.”	Receiver-limit vs network-limit — different brakes.
“Retries are free safety.”	Unbounded retries cause retry storms; budget + backoff + jitter.
“Consistent hashing balances evenly by itself.”	Needs virtual nodes.

14.7 Explain it back (Feynman)

Why doesn’t bandwidth fix a chatty request? [14.1]
Draw HTTP/2 vs QUIC HoL blocking. [14.2]
Flow vs congestion control — who’s protecting whom? [14.3]
Explain the 40 ms stall and its fix. [14.4]
L4 vs L7 — when each? [Part 3 §8]

14.8 Flashcards (cover the right column)

Prompt	Answer
Latency lever	Fewer round trips (not bandwidth)
HTTP/2 weakness	Transport HoL (one TCP stream)
QUIC fix	Independent per-stream loss recovery
Flow control	Don’t overrun the receiver
Congestion control	Don’t overrun the network (AIMD)
New-connection cost	Slow start → reuse connections
40 ms stall	Nagle + delayed ACK → TCP_NODELAY
Stateful affinity LB	Consistent hashing (+ virtual nodes)
Stateless LB	Power-of-two-choices

14.9 The 60-second recall

“Network latency is round trips plus propagation plus queuing — bandwidth only speeds big payloads, so the lever is fewer round trips (connection reuse, 0-RTT, CDNs near users). TCP is a single ordered byte stream, so one lost packet causes head-of-line blocking that stalls everything behind it, including all HTTP/2 streams; QUIC fixes this with independent per-stream recovery over UDP, plus 0–1 RTT setup and connection migration. Flow control protects the receiver; congestion control (slow start, AIMD) protects the network — and slow start penalizes new connections, so reuse them. Disable Nagle (TCP_NODELAY) for interactive RPC to avoid the 40 ms delayed-ACK stall. Balance stateful traffic with consistent hashing (plus virtual nodes), stateless with power-of-two-choices, and always bound retries with budgets, backoff, and jitter.”

Frequently asked questions

Why doesn’t more bandwidth fix latency?

Latency is dominated by propagation (speed of light), queuing, and per-round-trip protocol cost; bandwidth only affects the transmission time of large payloads. A request needing several round trips across a high-latency link is slow regardless of pipe size. Wins come from fewer round trips, avoiding head-of-line blocking, and shorter queues.

What is head-of-line blocking and how does QUIC fix it?

Head-of-line blocking is when one stalled item blocks everything behind it. HTTP/2 multiplexes streams over one TCP byte stream, so a single lost packet stalls all streams. QUIC runs over UDP with independent per-stream loss recovery, so a lost packet only stalls its own stream.

What’s the difference between flow control and congestion control?

Flow control is receiver-driven: the advertised window stops a fast sender from overrunning a slow receiver. Congestion control is network-driven: slow start and additive-increase/multiplicative-decrease stop senders from overrunning the shared network. Either can limit throughput.

L4 vs L7 load balancing — when do you use each?

L4 balances by connection and IP:port — cheap, fast, and protocol-agnostic, good for raw throughput. L7 parses requests to route by path, host, header, or cookie, terminate TLS, retry, and do canaries — more CPU but far more capable. Use L7 for content-based decisions, L4 for cheap pass-through.

What causes about 40ms of latency on small TCP requests?

The interaction of Nagle’s algorithm and delayed ACK: Nagle holds a small send waiting for an acknowledgment while the peer delays the ACK to piggyback, creating a standoff. Disable Nagle with TCP_NODELAY for interactive RPC, and reuse connections to avoid handshake and slow-start cost.