Week 21 - Consensus Algorithms: Raft (and a Glance at Paxos)¶
21.1 Conceptual Core¶
- Consensus is the problem of getting N nodes to agree on a sequence of values despite arbitrary message loss, reordering, and node failure (but not Byzantine failure).
- Raft is the modern teaching consensus: leader-based, log-replication-centric, decomposed into three sub-problems-leader election, log replication, safety. Read the Ongaro paper.
- Paxos is the older, denser counterpart. Read the Paxos Made Simple paper for fluency, but use Raft in implementation.
- The two properties Raft guarantees:
- Log matching: if two logs contain an entry with the same index and term, they are identical up to that index.
- Leader completeness: a committed entry exists in every leader's log thereafter.
21.2 Mechanical Detail¶
- Roles: Follower → Candidate (on election timeout) → Leader (on majority vote).
- Terms: monotonically increasing election epochs. Every RPC carries a term. Stale terms are rejected.
- Log entries:
(term, index, command). The leader appends client commands and replicates viaAppendEntries. - Commit: an entry is committed when a majority has it in their log. The leader advances
commitIndex. Once committed, the state machine can apply it. - Snapshots: long logs are compacted via
InstallSnapshot. Without snapshots, restart time and storage grow unbounded. - Production Raft libraries in Go:
- `hashicorp/raft - used by Consul, Nomad, Vault. Stable, mature, opinionated.
- `etcd-io/raft - used by etcd, CockroachDB, Kubernetes etcd. More flexible, more low-level.
- Read both. Pick
etcd-io/raftfor new builds-it has been hardened by years of CockroachDB and etcd production load.
21.3 Lab-"Read Raft in Anger"¶
- Read
etcd-io/raft/node.goandraft.goend-to-end. Annotate the state machine transitions. - Build a minimal in-memory KV store on top: a single goroutine consumes from
node.Ready(), applies entries to amap[string]string, persists log entries to a WAL, sends messages to peers, and acknowledges. - Run a 3-node cluster locally. Kill the leader; observe an election. Restart; observe log catchup.
- Add a snapshot mechanism every 10K entries.
21.4 Idiomatic & golangci-lint Drill¶
- The Raft codebases are dense; do not lint-refactor them. Instead, study their style: small functions, explicit state transitions, testable seams.
21.5 Production Hardening Slice¶
- Add
jepsen-io/jepsen - style fault injection: random partition, random clock skew, random crash. Run for 30 minutes. Verify linearizability via youretcd-io/raft - derived KV's history.