Week 22 - Distributed Storage Patterns¶
22.1 Conceptual Core¶
- The matching engine of a distributed storage system is consensus (week 21). The engineering of one is everything around it: durable storage, replication, partitioning, snapshotting, repair, observability.
- Three patterns to know:
- Replicated state machine (Raft, Paxos): one consensus group, each node holds the full data set. Linearizable; throughput limited by the leader.
- Sharded replicated (etcd, CockroachDB ranges): many consensus groups, one per data shard. Horizontal scale.
- Eventually consistent (Cassandra, DynamoDB): no consensus on the write path; quorum reads, hinted handoff, anti-entropy. Different consistency model.
22.2 Mechanical Detail¶
- WAL discipline: every state-changing op is durably logged before acknowledgment.
fsyncafter each batch (or per-op for stricter durability). The WAL is the source of truth for recovery. - Snapshots: periodic point-in-time captures of the state machine. Truncate the WAL behind them. Snapshot format must be efficient to ship to a recovering follower.
- Membership changes: adding/removing nodes is the hardest correctness boundary. Raft's "joint consensus" handles this. Both
hashicorp/raftandetcd-io/raftprovide APIs; do not roll your own. - Linearizable reads: three options-read from the leader after a heartbeat round (
etcd"linearizable read" with read-index), read from any node with a lease, or read after a no-op append. Each has tradeoffs. - Storage engine choice: BoltDB (simple, single-writer, great for Raft logs), BadgerDB (LSM-based, higher throughput), Pebble (CockroachDB's RocksDB replacement, the modern choice for high-throughput).
22.3 Lab-"Harden the KV Store"¶
Take the week 21 Raft KV and add:
1. Pebble as the storage engine for both the WAL and the state machine.
2. Snapshots every N entries, with InstallSnapshot to recovering followers.
3. Linearizable reads via read-index.
4. Membership changes: add and remove nodes online.
5. Metrics: per-node Raft state, log lag, snapshot duration, apply latency.
22.4 Idiomatic & golangci-lint Drill¶
errcheck,errorlint,wrapcheck. Distributed-systems code is almost entirely error handling; lint rigor is non-optional.
22.5 Production Hardening Slice¶
- Add a Jepsen-style "nemesis" goroutine to your test harness that randomly partitions, pauses, and restarts nodes. Verify linearizability over 1M operations.