Skip to content

Week 21 - Consensus Algorithms: Raft (and a Glance at Paxos)

21.1 Conceptual Core

  • Consensus is the problem of getting N nodes to agree on a sequence of values despite arbitrary message loss, reordering, and node failure (but not Byzantine failure).
  • Raft is the modern teaching consensus: leader-based, log-replication-centric, decomposed into three sub-problems-leader election, log replication, safety. Read the Ongaro paper.
  • Paxos is the older, denser counterpart. Read the Paxos Made Simple paper for fluency, but use Raft in implementation.
  • The two properties Raft guarantees:
  • Log matching: if two logs contain an entry with the same index and term, they are identical up to that index.
  • Leader completeness: a committed entry exists in every leader's log thereafter.

21.2 Mechanical Detail

  • Roles: Follower → Candidate (on election timeout) → Leader (on majority vote).
  • Terms: monotonically increasing election epochs. Every RPC carries a term. Stale terms are rejected.
  • Log entries: (term, index, command). The leader appends client commands and replicates via AppendEntries.
  • Commit: an entry is committed when a majority has it in their log. The leader advances commitIndex. Once committed, the state machine can apply it.
  • Snapshots: long logs are compacted via InstallSnapshot. Without snapshots, restart time and storage grow unbounded.
  • Production Raft libraries in Go:
  • `hashicorp/raft - used by Consul, Nomad, Vault. Stable, mature, opinionated.
  • `etcd-io/raft - used by etcd, CockroachDB, Kubernetes etcd. More flexible, more low-level.
  • Read both. Pick etcd-io/raft for new builds-it has been hardened by years of CockroachDB and etcd production load.

21.3 Lab-"Read Raft in Anger"

  1. Read etcd-io/raft/node.go and raft.go end-to-end. Annotate the state machine transitions.
  2. Build a minimal in-memory KV store on top: a single goroutine consumes from node.Ready(), applies entries to a map[string]string, persists log entries to a WAL, sends messages to peers, and acknowledges.
  3. Run a 3-node cluster locally. Kill the leader; observe an election. Restart; observe log catchup.
  4. Add a snapshot mechanism every 10K entries.

21.4 Idiomatic & golangci-lint Drill

  • The Raft codebases are dense; do not lint-refactor them. Instead, study their style: small functions, explicit state transitions, testable seams.

21.5 Production Hardening Slice

  • Add jepsen-io/jepsen - style fault injection: random partition, random clock skew, random crash. Run for 30 minutes. Verify linearizability via youretcd-io/raft - derived KV's history.

Comments