Saltar a contenido

Week 22 - Distributed Storage Patterns

22.1 Conceptual Core

  • The matching engine of a distributed storage system is consensus (week 21). The engineering of one is everything around it: durable storage, replication, partitioning, snapshotting, repair, observability.
  • Three patterns to know:
  • Replicated state machine (Raft, Paxos): one consensus group, each node holds the full data set. Linearizable; throughput limited by the leader.
  • Sharded replicated (etcd, CockroachDB ranges): many consensus groups, one per data shard. Horizontal scale.
  • Eventually consistent (Cassandra, DynamoDB): no consensus on the write path; quorum reads, hinted handoff, anti-entropy. Different consistency model.

22.2 Mechanical Detail

  • WAL discipline: every state-changing op is durably logged before acknowledgment. fsync after each batch (or per-op for stricter durability). The WAL is the source of truth for recovery.
  • Snapshots: periodic point-in-time captures of the state machine. Truncate the WAL behind them. Snapshot format must be efficient to ship to a recovering follower.
  • Membership changes: adding/removing nodes is the hardest correctness boundary. Raft's "joint consensus" handles this. Both hashicorp/raft and etcd-io/raft provide APIs; do not roll your own.
  • Linearizable reads: three options-read from the leader after a heartbeat round (etcd "linearizable read" with read-index), read from any node with a lease, or read after a no-op append. Each has tradeoffs.
  • Storage engine choice: BoltDB (simple, single-writer, great for Raft logs), BadgerDB (LSM-based, higher throughput), Pebble (CockroachDB's RocksDB replacement, the modern choice for high-throughput).

22.3 Lab-"Harden the KV Store"

Take the week 21 Raft KV and add: 1. Pebble as the storage engine for both the WAL and the state machine. 2. Snapshots every N entries, with InstallSnapshot to recovering followers. 3. Linearizable reads via read-index. 4. Membership changes: add and remove nodes online. 5. Metrics: per-node Raft state, log lag, snapshot duration, apply latency.

22.4 Idiomatic & golangci-lint Drill

  • errcheck, errorlint, wrapcheck. Distributed-systems code is almost entirely error handling; lint rigor is non-optional.

22.5 Production Hardening Slice

  • Add a Jepsen-style "nemesis" goroutine to your test harness that randomly partitions, pauses, and restarts nodes. Verify linearizability over 1M operations.

Comments