Saltar a contenido

Workshop - Bootstrap a Kubernetes control plane by hand

DifficultyCapstoneTime2 hours
Needs: Linux host with root; etcd, kube-apiserver, kube-scheduler, kubelet, kubectl binaries

Before you start:

  • All seven prior Kubernetes workshops
  • Comfortable starting and stopping long-running processes by hand
  • Read about etcd, the apiserver, and the Kubernetes object model

Launch in KillercodaFree browser-based environment - no install required to follow along.

Companion to Kubernetes -> Month 06 -> Weeks 21-22: Bootstrap, Control Plane, and Worker Nodes ("Kubernetes the Hard Way"). Every other workshop built things on top of Kubernetes. This one goes underneath: you start etcd and the kube-apiserver as plain processes by hand, talk to your hand-built control plane with kubectl, then add the scheduler and a kubelet - and watch a Pod actually run. By the end, the single most important truth about Kubernetes is something you've proven with your own hands: the control plane is just a few Go binaries and a database.

~120 minutes - the capstone of the workshop series. Needs: a Linux host, root, and the Kubernetes binaries (etcd, kube-apiserver, kube-scheduler, kubelet, kubectl - download from the official releases). This deliberately does not use kind/kubeadm - the whole point is no automation.

What you'll build, and the idea it makes concrete

You'll assemble a working (single-node, no-TLS-shortcuts-where-safe) Kubernetes control plane from individual binaries: etcd for storage, the apiserver as the front door, then the scheduler and kubelet so Pods actually run. No kubeadm, no kind, no installer - you start each process and wire them together yourself.

The idea this makes concrete:

Kubernetes is not a monolithic system - it's a handful of independent programs coordinating through one shared database (etcd) and one API (the apiserver). "The control plane" is: etcd (the only stateful component - the entire cluster state lives here as key/values), kube-apiserver (the only thing that talks to etcd; everything else talks to it), kube-scheduler and kube-controller-manager (control loops that watch the apiserver and act - the loops you built in every prior workshop), and on each node a kubelet (runs containers) and kube-proxy (networking). Take away the installers and a cluster is these processes plus etcd. There is no magic - there is process supervision and a database.

Every prior workshop built a participant in this system (a controller, a scheduler, an operator). This workshop builds the system itself, revealing that your custom scheduler and the real one are peers - both just clients of the apiserver.

Step 0: the architecture you're about to assemble by hand

                          +------------------+
   kubectl --------------> |                  |
   your custom controller->|  kube-apiserver  | <----- the ONLY component that
   kube-scheduler -------->|  (the front door)|        reads/writes etcd
   kube-controller-mgr --->|                  |
   kubelet (each node) --->+--------+---------+
                                    | (the only etcd client)
                                    v
                              +-----------+
                              |   etcd    |   <-- ALL cluster state lives here
                              | (the DB)  |       (every object, as key/value)
                              +-----------+

The non-obvious truths this layout encodes, which you'll verify: - etcd is the only stateful thing. Every Pod, Deployment, Secret, your Website CR from the operator workshop - all of it is rows in etcd. Lose etcd, lose the cluster. Back up etcd, back up everything. - Only the apiserver touches etcd. The scheduler, controllers, kubelet - none of them know etcd exists. They all go through the apiserver's REST API. This is why the apiserver is the central component and why everything is "just an API client." - Everything else is a stateless client running a watch-and-act loop. Restart the scheduler and nothing is lost - it re-reads state from the apiserver. This is why control-plane components are easy to make HA (run several, lease-elect a leader - the controller workshop's leader election).

Step 1: start etcd - the cluster's brain

etcd is a distributed key/value store. For a single-node learning cluster, one instance:

$ etcd \
  --data-dir=/tmp/etcd-data \
  --listen-client-urls=http://127.0.0.1:2379 \
  --advertise-client-urls=http://127.0.0.1:2379 &

It's just a process listening on :2379. Prove it's a plain key/value store - put and get a key directly:

$ ETCDCTL_API=3 etcdctl --endpoints=127.0.0.1:2379 put /hello world
OK
$ ETCDCTL_API=3 etcdctl --endpoints=127.0.0.1:2379 get /hello
/hello
world

That's the database the entire cluster will live in. Right now it holds your test key; in a minute it'll hold Kubernetes objects. (In production etcd runs as a 3 or 5 node Raft cluster for HA - Week 1's consensus material - but it's the same program.)

Step 2: start the apiserver - the front door

The apiserver is the REST API in front of etcd. Point it at your etcd:

$ kube-apiserver \
  --etcd-servers=http://127.0.0.1:2379 \
  --service-cluster-ip-range=10.0.0.0/24 \
  --bind-address=127.0.0.1 \
  --secure-port=6443 \
  --authorization-mode=AlwaysAllow \   # workshop shortcut; real clusters use RBAC + TLS auth
  --token-auth-file=/tmp/tokens.csv \  # a trivial token for kubectl (workshop)
  ... (cert flags) &

(Real bootstrap requires a PKI - CA, certs for each component, TLS everywhere. "Kubernetes the Hard Way" spends a whole section on certificates; for this workshop you can use --authorization-mode=AlwaysAllow and minimal certs to focus on the architecture. The security hardening is Week 23.)

Now point kubectl at your apiserver and talk to your hand-built control plane:

$ kubectl --server=https://127.0.0.1:6443 --token=<your-token> --insecure-skip-tls-verify get nodes
No resources found.        # the apiserver answers! (no nodes yet - we haven't added a kubelet)
$ kubectl ... get namespaces
NAME              STATUS   AGE
default           Active   10s
kube-system       Active   10s        <- the apiserver created the built-in namespaces in YOUR etcd

You're talking to a Kubernetes API you assembled from two processes. And here's the etcd connection made literal - look at what the apiserver wrote into etcd:

$ ETCDCTL_API=3 etcdctl --endpoints=127.0.0.1:2379 get /registry --prefix --keys-only | head
/registry/namespaces/default
/registry/namespaces/kube-system
/registry/apiregistration.k8s.io/apiservices/v1.
...

Every Kubernetes object lives under /registry/... in etcd. When you kubectl get namespaces, the apiserver reads these keys. When you kubectl create anything, it writes here. You can see the entire cluster as key/value pairs - the abstraction is gone, it's a database.

Step 3: create an object, watch it land in etcd

Make the etcd-is-the-state truth undeniable. Create a ConfigMap via the API, then read it straight from etcd:

$ kubectl ... create configmap demo --from-literal=k=v
configmap/demo created
$ ETCDCTL_API=3 etcdctl --endpoints=127.0.0.1:2379 get /registry/configmaps/default/demo
/registry/configmaps/default/demo
{"kind":"ConfigMap","apiVersion":"v1","metadata":{"name":"demo",...},"data":{"k":"v"}}

The object you created through the API is sitting in etcd as a value at a predictable key. kubectl -> apiserver -> etcd. That's the entire write path of Kubernetes, and you just watched a single object travel it. This is also why etcd backup = cluster backup: etcdctl snapshot save captures every object; restore it and the whole cluster comes back.

Step 4: add the scheduler - now Pods can be placed

Right now if you created a Pod it would sit Pending forever (exactly the scheduler workshop's lesson - no scheduler, no placement). Start the real scheduler as another client of your apiserver:

$ kube-scheduler \
  --kubeconfig=/tmp/scheduler.kubeconfig \   # points at https://127.0.0.1:6443
  --bind-address=127.0.0.1 &

It's just another process pointed at the apiserver - the same way your custom scheduler connected in that workshop. The real scheduler and your mini one are peers: both watch the apiserver for unscheduled Pods and write bindings back. There's nothing privileged about the "official" one; it's a client like any other.

Step 5: add a kubelet - now Pods actually run

The control plane decides; the kubelet executes (the scheduler workshop's decoupling). Start a kubelet to turn this host into a worker node:

$ kubelet \
  --kubeconfig=/tmp/kubelet.kubeconfig \     # registers with your apiserver
  --config=/tmp/kubelet-config.yaml \        # cgroup driver, container runtime endpoint, etc.
  --container-runtime-endpoint=unix:///run/containerd/containerd.sock &

The kubelet registers itself as a Node with the apiserver and starts watching for Pods assigned to it. Confirm your cluster now has a node:

$ kubectl ... get nodes
NAME       STATUS   ROLES    AGE   VERSION
myhost     Ready    <none>   20s   v1.29.x      <- your kubelet registered itself

Step 6: the payoff - run a Pod on the cluster you built

Everything's assembled: etcd (state) + apiserver (API) + scheduler (placement) + kubelet (execution). Create a Pod and watch it travel the entire pipeline you built by hand:

$ kubectl ... run hello --image=nginx
$ kubectl ... get pod hello -o wide -w
NAME    READY   STATUS              NODE
hello   0/1     Pending             <none>      <- created, stored in etcd, not yet scheduled
hello   0/1     ContainerCreating   myhost      <- scheduler bound it; kubelet is starting it
hello   1/1     Running             myhost      <- kubelet started the container

Trace what just happened across the components you started: 1. kubectl run -> apiserver validated and wrote the Pod to etcd (spec.nodeName=""). 2. The scheduler (watching the apiserver) saw an unscheduled Pod, picked the node, wrote the binding back through the apiserver to etcd. 3. The kubelet (watching the apiserver) saw a Pod assigned to it, called the container runtime, started nginx, and reported status back.

A container is running on a Kubernetes cluster that you assembled from individual processes - no kubeadm, no kind, no cloud. Every step went through the apiserver; all state lives in etcd; the scheduler and kubelet are just clients running watch-and-act loops. You've seen the whole machine with the cover off.

Step 7: prove the resilience model - kill and restart components

The "stateless clients + one stateful store" design has a dramatic consequence you can demonstrate. Kill the scheduler and the apiserver - leave etcd alone:

$ kill %scheduler %apiserver       # control plane "down"
$ kubectl ... get pods             # fails - the API is gone
The connection to the server was refused

The API is down, but the running Pod keeps running (the kubelet is independent and the container doesn't care the control plane is down). Now restart the apiserver and scheduler:

$ kube-apiserver ... &             # restart, pointed at the SAME etcd
$ kubectl ... get pods
NAME    READY   STATUS    AGE
hello   1/1     Running   5m         <- still there! state was never lost

Nothing was lost, because all state was in etcd the whole time - the apiserver and scheduler are stateless, so restarting them just re-reads from etcd. This is why control-plane components are trivially HA (run several, they share etcd) and why the only thing you must back up is etcd. Lose a control-plane process: restart it. Lose etcd without a backup: lose the cluster. You just proved both halves.

Now extend it (toward the real "Hard Way")

  1. Add the PKI. Do it properly: a CA, TLS certs for every component, --authorization-mode=Node,RBAC. This is half of "Kubernetes the Hard Way" and where Week 23's security lives. Painful and illuminating.
  2. Add kube-controller-manager. Start it and watch the built-in controllers (Deployment, ReplicaSet, the ones from the controller workshop) come alive - now kubectl create deployment actually creates Pods.
  3. Add kube-proxy + a CNI. Wire up Service networking (kube-proxy) and pod networking (your CNI from that workshop) so Pods get IPs and Services route - a fully functional node.
  4. Multi-node etcd. Run a 3-member etcd cluster and kill one member; watch Raft keep the cluster available (Week 1 consensus, live).
  5. Then appreciate kubeadm/kind. Run kubeadm init or kind create cluster and recognize every step it automates - it's doing exactly what you just did by hand, plus the PKI and CNI.

What you might wonder

"Is this really how production clusters are built?" The components are identical - production runs exactly these binaries (etcd, apiserver, scheduler, controller-manager, kubelet, kube-proxy). What differs is automation and hardening: kubeadm/kops/managed-services (EKS/GKE/AKS) bootstrap them with proper PKI, HA topologies, and lifecycle management. Managed control planes run these same processes for you (you just don't see them). Doing it by hand once means you know what those tools and services are actually managing.

"Why is etcd so central / why all the fuss about backing it up?" Because etcd holds 100% of cluster state - every object, as you saw under /registry/. Everything else is stateless and reconstructible. So etcd is your single point of truth and your single point of catastrophic failure: a corrupted etcd with no backup is an unrecoverable cluster. This is why etcd runs as an HA Raft cluster (survive node loss) and why etcdctl snapshot save on a schedule is non-negotiable. The most important cluster backup is the etcd snapshot.

"The apiserver is the only thing talking to etcd - why that design?" Centralizing etcd access in the apiserver gives one place for validation, authorization, admission (the webhook workshop!), versioning, and the watch/cache machinery every controller depends on. If components talked to etcd directly, you'd have no consistent policy enforcement and no shared watch infrastructure. The apiserver as the sole etcd client is why admission control and RBAC can exist - they're enforced at that single chokepoint.

"Everything's a client of the apiserver - even the scheduler?" Yes, and this is the unifying insight of the whole workshop series. The scheduler, controller-manager, kubelet, your custom controller, your custom scheduler, kubectl, Argo CD - all of them are just API clients running watch-and-act loops against the apiserver. There's no privileged inner ring. This is why you could build a scheduler and a controller as ordinary programs: they're the same kind of thing as the "real" components. The control plane is a set of peers coordinating through one API and one database.

"Should I ever bootstrap by hand in real life?" Almost never - use kubeadm, a managed service, or a tool. But doing it once is the single best way to understand Kubernetes: it dissolves the "magic cluster" mental model into "processes + etcd + an API," which is what makes you able to debug a broken control plane (apiserver won't start? check its etcd connection and certs; Pods stuck Pending? is the scheduler running? Pods won't start? is the kubelet healthy?). The hand-build is for understanding, not production.

What this gave you

  • You assembled a working Kubernetes control plane from individual processes: etcd + apiserver + scheduler + kubelet, no installer.
  • You saw cluster state is etcd - read Kubernetes objects directly as key/values under /registry/.
  • You watched a Pod travel the full pipeline you built: kubectl -> apiserver -> etcd -> scheduler -> kubelet -> running container.
  • You proved the resilience model: killed and restarted stateless components with zero data loss, because all state is in etcd.
  • You understand why etcd is the one thing to back up, why the apiserver is the sole etcd client (enabling RBAC/admission), and why everything is "just an API client."
  • You can now debug a broken control plane, because you know it's processes + a database, not magic.

The workshop series, complete

You've now built, by hand, the core of every layer of Kubernetes: - a controller (the reconcile loop - the atom of the platform), - an operator with a CRD (extending the API), - admission webhooks (intercepting the API), - a scheduler (deciding placement), - pod networking / a CNI (connectivity), - a GitOps sync loop (git as truth), - an autoscaler (feedback control), - and now the control plane itself (processes + etcd).

The thread through all of them: Kubernetes is a set of control loops coordinating through one API backed by one database. Every component - built-in or custom, control-plane or yours - is a client watching the apiserver and acting to reconcile desired state with actual. You don't just use Kubernetes now; you understand how it's built, because you've built each piece.

Back to the Hard-Way Capstone month, or revisit the controller workshop where the series began.

Submit your build

When you finish this workshop, share what you built so others can see and learn from your work. Include:

  • Public repo with the scripts you used to start each component
  • Output of `kubectl get nodes && kubectl get pods` against your hand-built control plane
  • etcdctl output showing a Pod object you created sitting at `/registry/pods/...`
  • Note on what you killed and restarted to prove the resilience model

Submit your build  Request feedback on your output  Discuss this workshop

Browse the gallery  |  All discussions

Comments