Transactions & Isolation Levels: The Complete System Design Reference

Transactions & Isolation — System Design Handbook Part 4 featured image

A self-contained reference for what “correct concurrent execution” actually means. Transactions promise ACID; isolation levels are the dial that trades correctness for concurrency, and almost every senior engineer can name the levels but can’t say exactly which anomaly each one still permits. This chapter makes that precise, then extends it to distributed transactions (2PC, Sagas, TCC) where the guarantees get expensive.

How to use this: Part 1 is the reference card. Part 2 maps the territory. Part 3 is the full depth with pros/cons per mechanism. Part 4 is exhaustive interview prep with counter-question ladders.

Key takeaways

  • Understand isolation levels by the exact anomaly each one still permits, not by name alone.
  • Snapshot isolation is not serializable — it permits write skew, the key litmus-test anomaly.
  • Two-phase commit is not consensus and blocks if the coordinator fails after prepare; make it fault-tolerant by replicating participants with consensus.
  • Across independent services use sagas or try-confirm-cancel, not distributed 2PC.

PART 1 — CHEATSHEET (Reference Card)

Every concept in this document, condensed.

ACID (Haerder & Reuter, 1983)

  • Atomicity — all of a transaction’s writes apply, or none do (rollback on failure).
  • Consistency — the transaction moves the DB from one valid state to another (app-defined invariants). This is the app’s “C,” not CAP’s.
  • Isolation — concurrent transactions don’t interfere; the dial this chapter is about.
  • Durability — once committed, survives crashes (WAL/fsync/replication).

The anomalies (what isolation prevents)

Anomaly What happens
Dirty write Overwrite another txn’s uncommitted write
Dirty read Read another txn’s uncommitted write
Non-repeatable read Re-reading a row returns a different value (committed by another txn)
Phantom Re-running a range query returns new/missing rows
Lost update Two read-modify-writes; one overwrites the other
Read skew Two reads see an inconsistent mix (one before, one after another commit)
Write skew Two txns read overlapping data, write disjoint rows, jointly violate an invariant

Isolation levels — what each PERMITS

Level Dirty read Non-repeatable Phantom Lost update Write skew
Read Uncommitted ✅ allowed
Read Committed
Repeatable Read / Snapshot Isolation ❌ (SI) / ✅ (lock-RR) ❌ (SI)
Serializable
Strict Serializable ❌ + real-time order

Concurrency control mechanisms

  • 2PL (two-phase locking) — acquire locks (growing phase), release (shrinking phase); strict 2PL holds write locks to commit. Pessimistic; gives serializability; can deadlock.
  • MVCC — keep multiple versions; readers see a snapshot, never block writers. Basis of SI.
  • OCC (optimistic) — run without locks, validate at commit, abort on conflict. Great for low contention.
  • SSI (serializable snapshot isolation) — SI + detect dangerous read-write dependency cycles and abort; true serializability with MVCC concurrency.

Distributed transactions

  • 2PC — prepare → commit across participants. Atomic but blocks if coordinator dies after prepare.
  • 3PC — non-blocking under crash-stop + synchrony; breaks under partitions; rarely used.
  • Saga — sequence of local txns + compensating actions; no global atomicity/isolation, only eventual consistency.
  • TCC (Try-Confirm-Cancel) — reserve resources (Try), then Confirm or Cancel; application-level 2PC.

Quick decision rules

  • Single-node, need correctness, moderate contention → Serializable (SSI if available) or Repeatable Read for most apps.
  • Read-heavy, want non-blocking reads → MVCC / Snapshot Isolation (watch write skew).
  • High contention, pessimistic safety → strict 2PL.
  • Low contention, want throughput → OCC.
  • Cross-service “transaction” → Saga (with compensations) or TCC, not distributed 2PC.

Top gotchas (litmus tests)

  1. Snapshot Isolation is NOT serializable — it permits write skew.
  2. Naming trap: Postgres “Repeatable Read” = Snapshot Isolation; Postgres “Serializable” = SSI (truly serializable). Oracle “Serializable” is actually SI.
  3. ACID’s C ≠ CAP’s C. Different concepts, same letter.
  4. 2PC blocks if the coordinator fails after PREPARE — participants hold locks indefinitely.
  5. Lost update isn’t fixed by Read Committed; needs atomic compare-and-set, explicit locking (SELECT FOR UPDATE), or SI’s first-committer-wins.
  6. Phantoms need range/predicate locks (or SI/serializable) — row locks alone don’t stop them.
  7. Sagas have no isolation — intermediate states are visible; design compensations and idempotency.
  8. 2PL deadlocks; the DB resolves by aborting a victim — your app must retry.
  9. Strict serializability = serializability + real-time order (what Spanner’s “external consistency” gives).
  10. “Serializable” performance is dominated by abort/retry rate under contention — measure it.

PART 2 — OUTLINE (full map)

  1. ACID, precisely (and how distributed systems strain each letter)
  2. The anomaly catalog
  3. The isolation-level ladder + ANSI critique + naming traps
  4. Two-phase locking (2PL, S2PL, SS2PL)
  5. Multi-Version Concurrency Control (MVCC)
  6. Optimistic Concurrency Control (OCC)
  7. Snapshot Isolation and write skew
  8. Serializable Snapshot Isolation (SSI)
  9. Distributed transactions: two-phase commit (2PC) and three-phase commit (3PC)
  10. Sagas
  11. Try-Confirm-Cancel (TCC)
  12. Percolator and deterministic transactions (Calvin)
  13. How real databases map to all this
  14. Decision guide
  15. Make it stick — the teaching tutorial (anomaly pictures, 2PC trace, mnemonics, flashcards)

PART 3 — DEEP DIVE

1. ACID, precisely

Atomicity — “all or nothing.” Implemented via a log (UNDO to roll back partial work; see WAL/ARIES). It is not about concurrency.

Consistency — the transaction preserves application invariants (foreign keys, balances ≥ 0). This is the weakest-defined ACID property and is largely the application’s responsibility; the DB only enforces declared constraints. Crucially, this “C” is unrelated to CAP’s consistency (linearizability).

Isolation — concurrent transactions produce a result equivalent to some serial execution (full isolation = serializability). The dial of weaker levels trades this for concurrency.

Durability — committed data survives crashes, via write-ahead logging + fsync to durable storage and/or replication to other nodes. In distributed systems durability means “on enough replicas,” because a single disk/node can be lost.

In distributed systems each letter strains: atomicity now spans nodes (needs 2PC); durability means replication; isolation across shards needs distributed concurrency control; and “consistency” gets tangled with replication consistency. Naming which letter a design is bending is a strong interview signal.

2. The anomaly catalog

Isolation levels are best understood by which anomalies they still allow. Definitions:

  • Dirty write — T2 overwrites a value T1 wrote but hasn’t committed. Almost always prevented (even Read Committed).
  • Dirty read — T2 reads T1’s uncommitted write; if T1 aborts, T2 saw data that never existed.
  • Non-repeatable read (read skew on a row) — T1 reads a row, T2 commits a change, T1 re-reads and sees a different value.
  • Phantom — T1 runs a range query, T2 inserts/deletes a matching row and commits, T1 re-runs and sees a different set.
  • Lost update — T1 and T2 both read X, both write X based on what they read; one update is lost. (e.g. two counter = counter + 1.)
  • Read skew — T1 reads X (before T2) and Y (after T2’s commit), seeing an inconsistent combination (e.g. money “disappears” mid-transfer).
  • Write skew — T1 and T2 each read an overlapping set, verify an invariant, then each writes a different row; individually fine, jointly the invariant is violated. The signature anomaly of Snapshot Isolation.

3. The isolation-level ladder + ANSI critique + naming traps

The ANSI SQL levels, from weakest to strongest, defined by anomalies prevented:

  • Read Uncommitted — prevents nothing meaningful (allows dirty reads).
  • Read Committed — no dirty reads/writes; still allows non-repeatable reads, phantoms, lost updates. The default in many databases.
  • Repeatable Read — same row reads are stable; classically still allows phantoms. In MVCC databases this is usually implemented as Snapshot Isolation.
  • Serializable — equivalent to some serial order; no anomalies.

The ANSI critique (Berenson et al., 1995): the original ANSI definitions were ambiguous and phrased in terms of locking; the authors showed they failed to capture anomalies like lost update and write skew, and precisely defined Snapshot Isolation as sitting between Repeatable Read and Serializable while permitting write skew — i.e. SI is not serializable.

Naming traps (memorize):

  • PostgreSQL “Repeatable Read” = Snapshot Isolation (permits write skew); PostgreSQL “Serializable” = SSI (truly serializable).
  • Oracle “Serializable” is actually Snapshot Isolation (not truly serializable).
  • So “Serializable” on the label does not guarantee serializability — know your engine.

4. Two-phase locking (2PL, S2PL, SS2PL)

A pessimistic scheme: a transaction acquires locks in a growing phase and releases in a shrinking phase; once it releases any lock it may acquire no more. This produces serializable schedules.

  • Strict 2PL (S2PL): hold all write locks until commit/abort (prevents cascading aborts).
  • Strong strict 2PL (SS2PL): hold all locks (read and write) until commit — the common implementation.
  • Shared (read) vs exclusive (write) locks; predicate/next-key locks are needed to prevent phantoms.

Pros: conceptually simple; gives serializability. Cons: deadlocks (resolved by aborting a victim — your app must retry); lock contention kills throughput under hot keys; readers block writers and vice versa (unlike MVCC). Latency is dominated by lock waits.

5. Multi-Version Concurrency Control (MVCC)

Instead of overwriting, the engine keeps multiple versions of each row, each tagged with the transaction that created it. A reader sees a consistent snapshot as of its start (a transaction ID/timestamp), reading the appropriate version — so readers never block writers and writers never block readers. Old versions are garbage-collected (e.g. Postgres VACUUM) once no transaction can see them.

MVCC is the substrate for Snapshot Isolation and for most modern databases (Postgres, MySQL/InnoDB, Oracle, SQL Server snapshot mode, CockroachDB, Spanner).

Pros: excellent read concurrency; consistent snapshots for free; great for read-heavy workloads. Cons: version storage and GC overhead (bloat if long-running transactions pin old versions); write-write conflicts still need detection; by itself yields SI, not serializability.

6. Optimistic Concurrency Control (OCC)

Run the transaction without locks against a snapshot, buffering writes; at commit, validate that no conflicting transaction committed in the meantime; if validation passes, write; else abort and retry (Kung & Robinson, 1981). Three phases: read, validate, write.

Pros: no lock overhead; ideal for low-contention workloads and read-mostly systems; no deadlocks. Cons: under high contention the abort/retry rate explodes (wasted work), making it slower than pessimistic locking; needs idempotent retry logic.

7. Snapshot Isolation and write skew

Snapshot Isolation (SI): each transaction reads from a consistent MVCC snapshot taken at its start and, at commit, succeeds only if no concurrent committed transaction wrote any of the same items (first-committer-wins). SI prevents dirty reads, non-repeatable reads, lost updates, and (in MVCC implementations) phantoms.

But SI permits write skew — the litmus test. Canonical example: two doctors are on call; a rule requires ≥1 on call. Both run “if (on_call_count ≥ 2) set myself off-call.” Each reads the snapshot (count = 2, fine), each sets a different doctor off, both commit — now zero doctors on call. Neither wrote the same row, so first-committer-wins doesn’t fire, and the invariant breaks. Serializability would forbid this.

Pros: strong, intuitive, great concurrency; the default “good enough” for most apps. Cons: write skew can silently violate invariants that span rows neither transaction writes; you must recognize when your invariant is write-skew-prone and escalate to serializable (or use explicit locking / materializing the conflict).

8. Serializable Snapshot Isolation (SSI)

SSI (Cahill, Röhm, Fekete, 2008) keeps MVCC’s non-blocking concurrency but adds detection of dangerous structures — specifically certain read-write dependency patterns (two consecutive rw-antidependency edges) that can lead to a non-serializable cycle — and aborts one transaction to break them. Result: true serializability with optimistic, MVCC-level concurrency. PostgreSQL’s “Serializable” is SSI.

Pros: serializable correctness without the lock contention of 2PL; readers don’t block. Cons: false-positive aborts (it’s conservative) → retry overhead under contention; tracking dependencies has bookkeeping cost.

9. Distributed transactions: 2PC and 3PC

When a transaction spans multiple nodes/shards, you need atomic commitment — all participants commit or all abort.

Two-Phase Commit (2PC): a coordinator runs:

  1. PREPARE — ask each participant to durably prepare (write to log, lock resources) and vote yes/no.
  2. COMMIT/ABORT — if all voted yes, tell everyone to commit; otherwise abort.

The blocking flaw: if the coordinator crashes after participants prepared but before delivering the decision, prepared participants are stuck — they hold locks and can neither commit (others may have voted no) nor abort (others may have committed). 2PC is not fault-tolerant for coordinator failure on its own.

Three-Phase Commit (3PC): inserts a pre-commit phase so a recovering participant can infer the outcome, making it non-blocking under crash-stop + synchrony — but it fails under network partitions (two sides reach different decisions) and assumes bounded delays. Rarely used in practice.

Modern fix: make the coordinator and participants each consensus-replicated (Raft/Paxos groups). Now “the coordinator” can’t simply vanish; its decision is itself durable and highly available. This is the Spanner pattern — 2PC across shards, consensus within each shard.

Pros of 2PC: genuine cross-node atomicity. Cons: blocking on coordinator failure; holds locks across the whole protocol (latency); doesn’t scale well across many participants or high latency.

10. Sagas

For long-lived or cross-service “transactions” where holding locks across services is untenable, a Saga (Garcia-Molina & Salem, 1987) is a sequence of local transactions, each with a compensating transaction that semantically undoes it. If step k fails, run compensations for steps k-1…1 in reverse.

  • Orchestration: a central coordinator drives the steps and compensations.
  • Choreography: services react to each other’s events with no central brain.

Pros: no long-held locks/coordinator; fits microservices; high availability. Cons: no isolation (intermediate states are visible to others — “dirty reads” at the business level), no automatic atomicity (compensation is best-effort and must be idempotent), and compensations can be semantically tricky (you can’t always “un-send” an email). You’re trading ACID for eventual consistency with explicit recovery.

11. Try-Confirm-Cancel (TCC)

TCC is application-level 2PC: each service exposes Try (reserve/hold resources — e.g. tentatively hold inventory), then the coordinator calls Confirm (commit the reservation) or Cancel (release it). Unlike a Saga’s after-the-fact compensation, TCC reserves up front, so it avoids exposing committed-then-undone state.

Pros: stronger than a Saga (reservations prevent oversell), still avoids distributed DB-level 2PC. Cons: every service must implement three idempotent operations and handle timeouts/hanging reservations; more development cost.

12. Percolator and deterministic transactions (Calvin)

  • Percolator (Peng & Dabek, 2010): Google’s system for large-scale incremental processing; layers snapshot-isolation transactions over Bigtable using a client-driven 2PC with a primary lock per transaction and a global timestamp oracle. Influenced TiDB and others — shows how to get cross-row transactions on a non-transactional KV store.
  • Calvin (Thomson et al., 2012): deterministic transactions — agree on a global order of transactions first (via a sequencing layer/consensus), then execute them deterministically on every replica. Because the order is pre-agreed and execution is deterministic, there are no commit-time aborts and no distributed 2PC for ordering; replicas reach the same result independently. Trades flexibility (transactions’ read/write sets must be known or pre-analyzed) for eliminating coordination at commit.

Pros: both push transactional semantics onto scalable, partitioned stores. Cons: Percolator’s latency (client-driven 2PC, timestamp oracle); Calvin’s need to know access sets and its reliance on a fast sequencing layer.

13. How real databases map to all this

  • PostgreSQL: MVCC; “Read Committed” (default), “Repeatable Read” = SI, “Serializable” = SSI.
  • MySQL/InnoDB: MVCC + next-key locks; default “Repeatable Read” (with gap locks reducing phantoms); “Serializable” uses locking.
  • Oracle: MVCC; “Serializable” is actually Snapshot Isolation.
  • SQL Server: lock-based by default; optional MVCC (“snapshot isolation”).
  • Spanner / CockroachDB: distributed; serializable (Spanner offers strict serializability via TrueTime; Cockroach uses serializable isolation with HLC + uncertainty restarts). Cross-shard atomicity via 2PC over consensus groups.

14. Decision guide

Single node?
├─ Read-heavy, want non-blocking reads ─────► MVCC / SNAPSHOT ISOLATION
│     └─ Invariant spans rows neither txn writes (write-skew risk)? ► escalate to SERIALIZABLE (SSI)
├─ High contention, need safety ────────────► STRICT 2PL (accept deadlock+retry)
├─ Low contention, want throughput ─────────► OCC
└─ Need true serializability w/ MVCC concurrency ► SSI

Crosses nodes/services?
├─ Same DB, multiple shards, need atomic commit ─► 2PC over CONSENSUS groups (Spanner-style)
├─ Cross-service, can reserve up front ──────────► TCC (Try-Confirm-Cancel)
└─ Cross-service, long-lived, can compensate ────► SAGA (+ idempotent compensations)

Reach-for / avoid:

  • Serializable (SSI)for: correctness-critical invariants, write-skew-prone logic. Avoid when: extreme contention makes abort rate dominate (consider partitioning the hot key).
  • Snapshot Isolationfor: most apps, read-heavy. Avoid when: write skew threatens an invariant.
  • 2PLfor: high-contention pessimistic safety. Avoid when: throughput matters and contention is low (use OCC/MVCC).
  • 2PCfor: cross-shard atomicity within one system, made fault-tolerant via consensus. Avoid when: it spans independent services (use Saga/TCC).
  • Sagafor: microservice workflows. Avoid when: you actually need isolation (intermediate states must be hidden).

PART 4 — INTERVIEW ARSENAL

How to wield this. Senior signals: (1) you describe isolation levels by the exact anomaly each permits, not by name; (2) you know SI ≠ serializable and can produce the write-skew example on demand; (3) for distributed transactions you reach for Saga/TCC across services and reserve 2PC for within-system shards (made fault-tolerant by consensus). Each question has a model answer and counter-ladder.

A. Fundamentals

Q1. What does each letter of ACID actually mean, and which is the application’s job? Answer: Atomicity (all-or-nothing, via UNDO log), Consistency (preserve app invariants — largely the app’s responsibility, only declared constraints are enforced by the DB), Isolation (the concurrency dial), Durability (survive crashes via WAL/fsync/replication). “Consistency” is the odd one out and is unrelated to CAP’s consistency. Counter-ladder:

  • “How is atomicity implemented?” → Write-ahead log with UNDO information; on crash, roll back uncommitted work.
  • “Durability in a distributed DB?” → Replication to a quorum, not just one disk’s fsync.
  • “Why is ACID’s C confusing?” → It overlaps the word with CAP’s C (linearizability) but means invariant preservation.

Q2. Explain isolation levels by the anomalies they allow. Answer: Read Uncommitted allows dirty reads; Read Committed stops dirty reads/writes but allows non-repeatable reads, phantoms, lost updates; Repeatable Read (often = Snapshot Isolation) stops those but allows write skew; Serializable allows nothing. Strict Serializable adds real-time ordering across transactions. Counter-ladder:

  • “Difference between Repeatable Read and Snapshot Isolation?” → In MVCC DBs they’re usually the same thing; classic lock-based RR still allows phantoms, SI prevents them but allows write skew.
  • “Postgres ‘Serializable’ vs Oracle ‘Serializable’?” → Postgres = SSI (truly serializable); Oracle = SI (not serializable).

B. The SI / serializable distinction

Q3. Is Snapshot Isolation serializable? Prove your answer. Answer: No. SI permits write skew. Example: invariant “≥1 doctor on call.” Two transactions each read count=2, each take a different doctor off call, each commits (different rows, so first-committer-wins doesn’t trigger); now zero on call. A serial execution would have blocked the second. So SI < Serializable. Counter-ladder:

  • “How do you fix write skew under SI?” → Materialize the conflict (write to a shared row both touch), use SELECT ... FOR UPDATE to lock the read set, or use Serializable/SSI.
  • “Does SI prevent lost updates?” → Yes, via first-committer-wins on the same item; write skew is different because the writes are to different items.

Q4. How does SSI give serializability without 2PL’s locking cost? Answer: It runs on MVCC (non-blocking reads) and tracks read-write dependencies, aborting a transaction when it detects the dangerous pattern (two consecutive rw-antidependency edges) that can form a non-serializable cycle. So you get serializability optimistically, paying only in occasional aborts rather than constant lock waits. Counter-ladder:

  • “Downside of SSI?” → Conservative → false-positive aborts and retries under contention; dependency-tracking overhead.
  • “When would 2PL beat SSI?” → Very high contention where optimistic aborts thrash; pessimistic locking can be more stable.

C. Concurrency control mechanisms

Q5. 2PL vs MVCC vs OCC — pick one for a read-heavy, low-contention service and justify. Answer: MVCC (Snapshot Isolation) — readers never block writers, ideal for read-heavy; low contention means write-write conflicts are rare. OCC also fits low contention (validate-at-commit, few aborts). 2PL would needlessly block readers and add lock overhead. If an invariant is write-skew-prone, layer SSI. Counter-ladder:

  • “Now contention spikes — what breaks?” → OCC abort/retry rate explodes; MVCC bloats from long-running readers; consider 2PL or partitioning the hot key.
  • “How does MVCC garbage-collect?” → Removes versions no active transaction can see (e.g. Postgres VACUUM); long transactions pin versions and cause bloat.

D. Distributed transactions

Q6. Walk through 2PC and its failure mode. Answer: Coordinator sends PREPARE; participants durably prepare and vote; if unanimous yes, coordinator sends COMMIT, else ABORT. Failure: if the coordinator dies after participants prepared but before sending the decision, they’re stuck holding locks — can’t safely commit or abort. So 2PC is not fault-tolerant alone. Counter-ladder:

  • “How do you make 2PC fault-tolerant?” → Replicate the coordinator (and participants) via consensus so the decision is durable and available — the Spanner pattern.
  • “Does 3PC solve it?” → Non-blocking under crash-stop + synchrony, but breaks under partitions; rarely used.
  • “What’s held during 2PC?” → Locks on every participant for the protocol’s duration → latency and reduced concurrency.

Q7. Cross-service order placement (payment + inventory + shipping). Distributed 2PC? Answer: No — don’t hold DB locks across independent services. Use a Saga (each service does a local transaction with a compensating action) or TCC (Try reserves inventory/payment hold, then Confirm/Cancel). TCC if you must prevent oversell (reserve up front); Saga if compensation after the fact is acceptable. Make every step idempotent and handle partial failure with compensations/retries. Counter-ladder:

  • “Saga’s big weakness?” → No isolation — intermediate states are visible (an order can be ‘paid’ but not yet ‘reserved’); design for it.
  • “Compensation that can’t be undone (email sent)?” → Use semantic compensation (send a correction) or order steps so irreversible actions come last.
  • “TCC vs Saga?” → TCC reserves before confirming (prevents oversell); Saga acts then compensates (can expose/oversell transiently).

E. Worked drill — driving a design end-to-end

Watch the isolation requirement get named precisely, then the cheapest mechanism chosen, with the write-skew check explicit.

Prompt: “Design the booking core for a meeting-room reservation system: many users book rooms concurrently, no room may be double-booked, and the system spans an app tier and a SQL database. Later, bookings also trigger a calendar service and a billing service.”

1 — Name the invariant and the anomaly risk. “Invariant: no two confirmed bookings overlap for the same room. Two users checking ‘is this slot free?’ then both inserting a booking is a classic write skew / phantom scenario — each reads ‘free’ on a snapshot, each inserts a different row, both commit, room double-booked. So Read Committed and plain Snapshot Isolation are not safe here.”

2 — Choose the isolation mechanism. “Two correct options: (a) Serializable (SSI) so the conflicting read-write pattern is detected and one booking aborts (then retry); or (b) materialize the conflict — lock the room’s slot row with SELECT ... FOR UPDATE (or a uniqueness constraint on (room, slot)) so the second insert blocks/fails. I’d use a uniqueness constraint on (room_id, time_slot) as the hard guarantee plus Serializable for the read-check path — defense in depth, and the constraint makes double-booking impossible regardless of isolation bugs.”

3 — Concurrency control choice. “MVCC underneath for read concurrency; the hot contention is per-room, so conflicts are localized — OCC-style retries are fine. Under a popular room, abort/retry rate rises; I partition contention by making the unique constraint the arbiter (one winner, others get a clean constraint violation to retry/round-robin).”

4 — Extend to cross-service (calendar + billing). “Booking confirmation now spans services — I will not use distributed 2PC. The DB booking commits first (the source of truth, protected by the constraint). Then a Saga/TCC: TCC-style, I reserve a billing hold and a tentative calendar entry on Try, and Confirm them after the DB commit; if Confirm fails, Cancel and compensate (release hold, remove tentative entry). All steps idempotent, keyed by booking ID.”

5 — Failure behavior. “If billing Confirm hangs, the reservation has a timeout → Cancel path runs, booking is marked unconfirmed and the slot freed (compensation). The unique constraint guarantees that even under retries we never double-book; idempotency keys guarantee we never double-charge.”

6 — Tradeoffs stated. “I chose a hard DB-level uniqueness guarantee + Serializable for the core invariant (correctness over a bit of abort/retry latency on hot rooms), and Saga/TCC across services (availability and loose coupling over distributed-2PC atomicity, accepting visible intermediate states managed by compensations). If the business needed strict atomicity across services, I’d have to introduce a coordinator over consensus — far more cost than the workflow warrants here.”

Template: name the invariant → identify the anomaly → pick the cheapest isolation that prevents it (plus a hard constraint) → handle cross-service with Saga/TCC, not 2PC → state the tradeoff.

F. Consolidated gotchas & traps (rapid fire)

  • SI ≠ serializable (write skew).
  • Naming trap: Postgres RR = SI; Postgres Serializable = SSI; Oracle Serializable = SI.
  • Lost update survives Read Committed — needs atomic CAS / FOR UPDATE / SI.
  • Phantoms need predicate/next-key locks or SI/serializable.
  • 2PC blocks on coordinator failure; fix with consensus.
  • 3PC breaks under partitions — don’t propose it as the answer.
  • Sagas have no isolation; intermediate states leak.
  • 2PL deadlocks → victim abort → app must retry.
  • ACID’s C ≠ CAP’s C.
  • “Serializable” label ≠ serializable depending on engine.

G. Pros/cons master tables

Concurrency control

Mechanism Pros Cons
Strict 2PL Serializable; simple model Deadlocks; readers block writers; contention
MVCC (SI) Non-blocking reads; great for read-heavy Write skew; version bloat/GC; not serializable
OCC No locks; great low-contention Abort storms under contention
SSI Serializable + MVCC concurrency False-positive aborts; tracking overhead

Distributed transactions

Approach Pros Cons
2PC (over consensus) Cross-shard atomicity, fault-tolerant Latency; locks held; coordinator complexity
3PC Non-blocking (crash-stop) Breaks under partitions; rarely used
Saga No locks; microservice-friendly No isolation; compensation complexity
TCC Reserves up front; prevents oversell 3 idempotent ops/service; dev cost

Go deeper (primary sources)

  • Haerder & Reuter, “Principles of Transaction-Oriented Database Recovery” (1983) — the ACID acronym.
  • Gray & Reuter, Transaction Processing: Concepts and Techniques (1992).
  • Berenson, Bernstein, Gray, Melton, O’Neil, O’Neil, “A Critique of ANSI SQL Isolation Levels” (1995) — defines SI and the anomalies precisely.
  • Bernstein, Hadzilacos, Goodman, Concurrency Control and Recovery in Database Systems (1987).
  • Kung & Robinson, “On Optimistic Methods for Concurrency Control” (1981).
  • Cahill, Röhm, Fekete, “Serializable Isolation for Snapshot Databases” (2008) — SSI.
  • Peng & Dabek, “Large-scale Incremental Processing Using Distributed Transactions and Notifications” (2010) — Percolator.
  • Thomson et al., “Calvin: Fast Distributed Transactions for Partitioned Database Systems” (2012).
  • Garcia-Molina & Salem, “Sagas” (1987).

PART 5 — MAKE IT STICK (Teaching Tutorial)

The references are the map; this is the driving lesson. Isolation levels never stick because people memorize names. Instead, see each anomaly as a picture, climb the ladder one rung at a time, and watch 2PC freeze. Trace the two diagrams yourself.

15.1 The big idea in one line

An isolation level is a deal: “I’ll let these anomalies happen in exchange for more concurrency.” Serializable forbids everything (slow, safe); Read Committed forbids little (fast, risky). So don’t learn levels — learn anomalies, then a level is just “which anomalies it still allows.”

15.2 The anomalies as pictures (this is the whole topic)

 LOST UPDATE                 NON-REPEATABLE READ          WRITE SKEW (the boss level)
 T1 read x=5                 T1 read x=5                   inv: ≥1 doctor on call
 T2 read x=5                 T2 write x=8, commit          T1 sees {A,B on} → A off
 T1 write x=6 (5+1)          T1 read x=8  ← changed!       T2 sees {A,B on} → B off
 T2 write x=6 (5+1)                                        both commit (different rows!)
 → one +1 LOST                                            → ZERO on call. invariant BROKEN

Write skew is the one that matters most because Snapshot Isolation stops the other two but still allows this one — two transactions read the same thing, write different rows, and jointly break a rule. Remember the doctors and you remember why Snapshot Isolation is NOT serializable.

15.3 Climb the ladder one rung (each rung removes one anomaly)

 READ UNCOMMITTED  ── allows: dirty read, everything          (basically don't)
        │ + forbid dirty reads
 READ COMMITTED    ── still allows: non-repeatable, lost update, write skew   (common default)
        │ + stable reads within the txn
 SNAPSHOT ISOL.    ── still allows: WRITE SKEW                 (Postgres "Repeatable Read")
        │ + detect dangerous read/write cycles
 SERIALIZABLE (SSI)── allows: NOTHING                          (Postgres "Serializable")

Naming trap to tattoo on your brain: Postgres “Repeatable Read” = Snapshot Isolation (allows write skew); Postgres “Serializable” = SSI (truly safe); Oracle “Serializable” is only Snapshot Isolation. The label lies — know your engine.

15.4 Watch 2PC freeze (why distributed commit is scary)

 Coordinator        Shard A        Shard B
   PREPARE ───────►  ✔ ready (locks held)
   PREPARE ──────────────────────► ✔ ready (locks held)
   ☠ COORDINATOR CRASHES before sending COMMIT/ABORT ☠
                     A: "do I commit or abort??"  ← can't decide alone:
                     B: "do I commit or abort??"     B might have voted no / yes
   → both BLOCK, holding locks, indefinitely.

That’s the 2PC blocking flaw: prepared participants can’t safely commit (someone may have voted no) or abort (someone may have committed). Fix: make the coordinator itself a consensus group (Raft) so its decision survives the crash — that’s the Spanner pattern: 2PC across shards, consensus within each. And across independent services, don’t use 2PC at all — use a Saga (do + compensate) or TCC (reserve, then confirm/cancel).

15.5 Analogies that stick

  • Isolation levels = privacy curtains between transactions. Serializable draws every curtain (no peeking, slow); Read Committed leaves gaps.
  • MVCC = everyone shops with a photo of the shelves taken when they walked in — readers never block writers; they just read their snapshot.
  • *Write skew = two people each grabbing the last of different shared resources* because each only checked their own — together they overdraw.
  • 2PC = a wedding officiant asking “do you?” to both — if the officiant faints between the two “I do”s, nobody knows if they’re married.

15.6 Misconceptions → corrections

You might think… Actually…
“Snapshot Isolation is serializable.” No — it allows write skew (the doctors).
“Postgres Repeatable Read = serializable.” It’s Snapshot Isolation; “Serializable” (SSI) is the safe one.
“Read Committed stops lost updates.” No — needs SELECT FOR UPDATE, atomic CAS, or higher isolation.
“2PC is fault-tolerant.” It blocks on coordinator failure; replicate the coordinator with consensus.
“Use 2PC across microservices.” Use Sagas/TCC — 2PC holds locks across services and blocks.
“ACID’s C = CAP’s C.” Different ideas, same letter.

15.7 Explain it back (Feynman)

  1. Why “learn anomalies, not levels”? [15.1]
  2. Draw write skew; why does Snapshot Isolation allow it? [15.2]
  3. Climb the ladder — which anomaly does each rung remove? [15.3]
  4. Draw the 2PC freeze and give the fix. [15.4]
  5. When do you use a Saga instead of a distributed transaction? [Part 3 §10]

15.8 Flashcards (cover the right column)

Prompt Answer
Isolation level = A deal trading anomalies for concurrency
The boss anomaly Write skew (the doctors)
SI allows Write skew → not serializable
Postgres “Repeatable Read” Snapshot Isolation (allows write skew)
Postgres “Serializable” SSI (truly serializable)
Oracle “Serializable” Only Snapshot Isolation
2PC failure Blocks if coordinator dies after PREPARE
2PC fix Replicate coordinator via consensus (Spanner)
Cross-service “transaction” Saga or TCC, not 2PC
MVCC win Readers never block writers

15.9 The 60-second recall

“A transaction promises ACID; isolation is the dial trading correctness for concurrency. Don’t memorize level names — learn the anomalies (lost update, non-repeatable read, write skew), and a level is just which ones it still permits. The killer is write skew: two transactions read the same data, write different rows, jointly break an invariant — and Snapshot Isolation allows it, which is exactly why SI is not serializable (Postgres ‘Repeatable Read’ = SI; ‘Serializable’ = SSI; Oracle ‘Serializable’ = only SI). MVCC lets readers never block writers. Across nodes, two-phase commit gives atomicity but blocks if the coordinator dies after prepare — fix it by replicating the coordinator with consensus. Across independent services, use Sagas or TCC, never distributed 2PC.”

Frequently asked questions

Is snapshot isolation serializable?

No. Snapshot isolation prevents dirty reads, non-repeatable reads, and lost updates, but it permits write skew, where two transactions read an overlapping set, write disjoint rows, and jointly violate an invariant. Serializability forbids this.

What is write skew?

Write skew occurs when two transactions each read an overlapping dataset, each verifies an invariant, then each writes a different row, so individually they are fine but jointly they break the invariant — for example, two on-call doctors both marking themselves off duty. It is the signature anomaly that snapshot isolation allows.

Why does two-phase commit block?

If the coordinator crashes after participants have prepared but before it delivers the commit or abort decision, prepared participants are stuck holding locks — they cannot safely commit or abort alone. Replicating the coordinator and participants with consensus (the Spanner pattern) makes it fault-tolerant.

What is the difference between a saga and a distributed transaction?

A saga is a sequence of local transactions, each with a compensating action to undo it, providing eventual consistency without distributed locks — but no isolation, so intermediate states are visible. A distributed transaction (2PC) gives atomicity and isolation but holds locks and blocks on coordinator failure.

Does PostgreSQL “Repeatable Read” mean serializable?

No. PostgreSQL’s Repeatable Read is actually snapshot isolation, which permits write skew. PostgreSQL’s Serializable level uses Serializable Snapshot Isolation (SSI), which is truly serializable. Oracle’s Serializable, by contrast, is only snapshot isolation.

Previous