Saltar a contenido

Workshop - Build a Kubernetes controller from scratch

DifficultyDeepTime75 min
Needs: Linux or macOS, Go 1.21+, Docker, kind or k3d

Before you start:

  • Comfortable reading and writing Go
  • Have run kubectl against any cluster at least once
  • Read the reconcile-loop section of the Kubernetes path (Month 1)

Launch in KillercodaFree browser-based environment - no install required to follow along.

Companion to Kubernetes -> Month 03 -> Week 9: client-go Internals and a Bare Controller. The chapter explains the informer + workqueue pattern. This workshop has you build a real, running controller in Go and then watch it reconcile - including the moment that makes Kubernetes finally click: you delete something it manages, and it heals it back. By the end you'll have a working prototype and the one mental model the entire platform is built on.

~90 minutes. Needs: a local cluster (kind or k3d - free, Kubernetes-in-Docker), Go 1.21+, and kubectl. No cloud, no cost.

What you'll build, and the idea it makes concrete

You'll build a controller that watches ConfigMaps labeled workshop.io/managed=true and ensures each one has a companion ConfigMap named <name>-synced holding a copy of its data. Small on purpose - because the point isn't the feature, it's watching the reconcile loop work.

Here's the one idea, and why building beats reading it:

Kubernetes is declarative and level-triggered. You declare desired state; controllers continuously drive actual state toward it. A controller doesn't react to "an event happened" - it asks "what should exist? what does exist? make them match," over and over, forever. That's why you can delete a Pod and the Deployment brings it back, why kubectl apply is idempotent, why the system self-heals.

You can read that paragraph ten times and not feel it. You'll feel it in Step 7 when you delete a ConfigMap and watch your own code recreate it within a second. Building a controller is the "build a container by hand" of Kubernetes - it dispels the magic by making you the magician.

Step 0: spin up a cluster

$ kind create cluster --name workshop
$ kubectl cluster-info --context kind-workshop
$ kubectl get nodes
NAME                     STATUS   ROLES           AGE   VERSION
workshop-control-plane   Ready    control-plane   30s   v1.29.x

A full Kubernetes cluster, running in Docker on your laptop, in ~20 seconds. (k3d is equally fine: k3d cluster create workshop.) Everything in this workshop runs against it.

Step 1: the mental model - level vs edge triggered

Before code, fix the distinction that everything hinges on:

  • Edge-triggered (the wrong model for K8s): "when X changes, do Y." Fragile - if you miss the event (controller was down, a message dropped), you never act, and state drifts forever.
  • Level-triggered (how K8s works): "whenever you run, make actual match desired - regardless of what changed or whether you saw the change." Robust - a missed event is harmless because the next reconcile fixes it anyway.

A Kubernetes controller is a level-triggered reconcile loop:

loop forever:
    desired = what the spec says should exist
    actual  = what's really in the cluster
    if actual != desired:
        make actual match desired

Events (a ConfigMap was created/changed/deleted) are just hints to reconcile sooner - not instructions about what to do. The reconcile function always recomputes from scratch. Hold this; the code is a direct expression of it.

Step 2: scaffold the project

$ mkdir mirror-controller && cd mirror-controller
$ go mod init workshop/mirror-controller
$ go get k8s.io/client-go@v0.29.3 k8s.io/api@v0.29.3 k8s.io/apimachinery@v0.29.3 k8s.io/klog/v2

Three libraries do the work: client-go (the Kubernetes Go client - informers, workqueues, clients), api (the typed objects like ConfigMap), apimachinery (the machinery: object metadata, errors, label selectors).

Step 3: connect to the cluster

First, just prove you can talk to the API. Create main.go:

package main

import (
    "context"
    "fmt"
    "os"
    "path/filepath"

    metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
    "k8s.io/client-go/kubernetes"
    "k8s.io/client-go/tools/clientcmd"
)

func main() {
    // Out-of-cluster: read your kubeconfig (the same file kubectl uses).
    kubeconfig := filepath.Join(os.Getenv("HOME"), ".kube", "config")
    config, err := clientcmd.BuildConfigFromFlags("", kubeconfig)
    if err != nil {
        panic(err)
    }
    clientset, err := kubernetes.NewForConfig(config)
    if err != nil {
        panic(err)
    }

    // Sanity check: list ConfigMaps in the default namespace.
    cms, err := clientset.CoreV1().ConfigMaps("default").List(context.TODO(), metav1.ListOptions{})
    if err != nil {
        panic(err)
    }
    fmt.Printf("connected. %d configmaps in default\n", len(cms.Items))
}
$ go run .
connected. 1 configmaps in default

That clientset is your typed door to every Kubernetes API. clientset.CoreV1().ConfigMaps(ns).Get/List/Create/Update/Delete(...) - the verbs you'd guess. You're now a Kubernetes client. But polling with List in a loop would hammer the API. Controllers use informers instead.

Step 4: the informer - watch, don't poll

An informer maintains a local, always-current cache of objects by watching the API, and calls your handlers when things change. It's the efficient "watch" that makes controllers cheap. Replace main with the informer wiring:

package main

import (
    "os"
    "path/filepath"
    "time"

    "k8s.io/client-go/informers"
    "k8s.io/client-go/kubernetes"
    "k8s.io/client-go/tools/cache"
    "k8s.io/client-go/tools/clientcmd"
    "k8s.io/klog/v2"
)

func main() {
    kubeconfig := filepath.Join(os.Getenv("HOME"), ".kube", "config")
    config, _ := clientcmd.BuildConfigFromFlags("", kubeconfig)
    clientset, _ := kubernetes.NewForConfig(config)

    // A factory builds informers that share one watch connection + cache.
    // resync period 30s: re-deliver every cached object every 30s (belt-and-suspenders).
    factory := informers.NewSharedInformerFactory(clientset, 30*time.Second)
    cmInformer := factory.Core().V1().ConfigMaps().Informer()

    // For now, just log what the informer sees, to watch it work.
    cmInformer.AddEventHandler(cache.ResourceEventHandlerFuncs{
        AddFunc:    func(obj interface{}) { klog.Infof("ADD    %s", key(obj)) },
        UpdateFunc: func(old, new interface{}) { klog.Infof("UPDATE %s", key(new)) },
        DeleteFunc: func(obj interface{}) { klog.Infof("DELETE %s", key(obj)) },
    })

    stop := make(chan struct{})
    defer close(stop)
    factory.Start(stop)                              // start watching
    cache.WaitForCacheSync(stop, cmInformer.HasSynced) // wait for the initial list
    klog.Info("cache synced, watching...")
    <-stop // block forever
}

func key(obj interface{}) string {
    k, _ := cache.MetaNamespaceKeyFunc(obj)  // "namespace/name"
    return k
}

Run it, then in another terminal create a ConfigMap and watch the informer notice:

$ go run .
cache synced, watching...
ADD    kube-system/kube-root-ca.crt
ADD    default/some-existing-cm
...

# other terminal:
$ kubectl create configmap demo --from-literal=hello=world
# back in the controller log, instantly:
UPDATE default/demo
ADD    default/demo
$ kubectl delete configmap demo
DELETE default/demo

You're watching the informer's event stream in real time. But notice the problem: these handlers run inline on the watch thread, and if reconcile is slow or errors, you'd block the stream or drop work. That's what the workqueue fixes - and it's where the level-triggered design lives.

Step 5: the workqueue - the heart of the pattern

The handlers shouldn't do the work. They should just drop a key (namespace/name) onto a queue. Separate workers pull keys and reconcile. This decouples "something changed" from "do the work," gives you retries with backoff, and dedupes (the same key queued twice = processed once). This is the controller pattern.

The key insight that makes it level-triggered: the queue holds keys, not events. A worker pulling key default/foo doesn't know or care whether foo was added, updated, or its companion was deleted - it just reconciles foo from scratch. Here's the full controller:

package main

import (
    "context"
    "crypto/sha256"
    "encoding/hex"
    "fmt"
    "os"
    "path/filepath"
    "strings"
    "time"

    corev1 "k8s.io/api/core/v1"
    apierrors "k8s.io/apimachinery/pkg/api/errors"
    metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
    "k8s.io/client-go/informers"
    "k8s.io/client-go/kubernetes"
    listers "k8s.io/client-go/listers/core/v1"
    "k8s.io/client-go/tools/cache"
    "k8s.io/client-go/tools/clientcmd"
    "k8s.io/client-go/util/workqueue"
    "k8s.io/klog/v2"
)

const (
    managedLabel = "workshop.io/managed"
    syncedSuffix = "-synced"
    checksumAnn  = "workshop.io/source-checksum"
)

type Controller struct {
    client   kubernetes.Interface
    lister   listers.ConfigMapLister   // reads from the cache, never the API (fast)
    synced   cache.InformerSynced
    queue    workqueue.RateLimitingQueue
}

func main() {
    kubeconfig := filepath.Join(os.Getenv("HOME"), ".kube", "config")
    config, _ := clientcmd.BuildConfigFromFlags("", kubeconfig)
    client, _ := kubernetes.NewForConfig(config)

    factory := informers.NewSharedInformerFactory(client, 30*time.Second)
    cmInformer := factory.Core().V1().ConfigMaps()

    c := &Controller{
        client: client,
        lister: cmInformer.Lister(),
        synced: cmInformer.Informer().HasSynced,
        queue:  workqueue.NewRateLimitingQueue(workqueue.DefaultControllerRateLimiter()),
    }

    // Handlers enqueue a SOURCE key. The trick: a change to a companion
    // (named foo-synced) enqueues its source (foo), so deleting a companion
    // triggers reconcile of the source -> self-healing.
    cmInformer.Informer().AddEventHandler(cache.ResourceEventHandlerFuncs{
        AddFunc:    c.enqueue,
        UpdateFunc: func(old, new interface{}) { c.enqueue(new) },
        DeleteFunc: c.enqueue,
    })

    stop := make(chan struct{})
    defer close(stop)
    factory.Start(stop)
    c.Run(2, stop) // 2 workers
}

// enqueue maps any observed ConfigMap to the SOURCE key to reconcile.
func (c *Controller) enqueue(obj interface{}) {
    cm, ok := obj.(*corev1.ConfigMap)
    if !ok { // tombstone on delete
        if t, isT := obj.(cache.DeletedFinalStateUnknown); isT {
            cm, _ = t.Obj.(*corev1.ConfigMap)
        }
        if cm == nil { return }
    }
    name := cm.Name
    if strings.HasSuffix(name, syncedSuffix) {
        name = strings.TrimSuffix(name, syncedSuffix) // companion -> source
    } else if cm.Labels[managedLabel] != "true" {
        return // not ours, and not a companion: ignore
    }
    c.queue.Add(cm.Namespace + "/" + name)
}

func (c *Controller) Run(workers int, stop <-chan struct{}) {
    defer c.queue.ShutDown()
    if !cache.WaitForCacheSync(stop, c.synced) {
        return
    }
    klog.Info("cache synced, controller running")
    for i := 0; i < workers; i++ {
        go func() {
            for c.processNext() {
            }
        }()
    }
    <-stop
}

func (c *Controller) processNext() bool {
    key, quit := c.queue.Get()
    if quit {
        return false
    }
    defer c.queue.Done(key)

    if err := c.reconcile(key.(string)); err != nil {
        klog.Errorf("reconcile %s failed, requeueing: %v", key, err)
        c.queue.AddRateLimited(key) // retry with exponential backoff
        return true
    }
    c.queue.Forget(key) // success: reset the backoff
    return true
}

Step 6: the reconcile function - desired vs actual, made literal

Here's where the mental model becomes code. reconcile is handed a source key. It computes desired state and makes the cluster match - every time, from scratch:

func (c *Controller) reconcile(key string) error {
    ns, name, _ := cache.SplitMetaNamespaceKey(key)

    // 1. What's the SOURCE? (read from cache via the lister)
    source, err := c.lister.ConfigMaps(ns).Get(name)
    if apierrors.IsNotFound(err) {
        // Source is gone. The companion has an owner reference to it,
        // so Kubernetes garbage-collects the companion automatically.
        klog.Infof("source %s gone; companion will be GC'd", key)
        return nil
    }
    if err != nil {
        return err
    }
    if source.Labels[managedLabel] != "true" {
        return nil // not managed
    }

    // 2. What SHOULD the companion look like? (desired state)
    sum := checksum(source.Data)
    companionName := name + syncedSuffix
    desired := &corev1.ConfigMap{
        ObjectMeta: metav1.ObjectMeta{
            Name:      companionName,
            Namespace: ns,
            Annotations: map[string]string{checksumAnn: sum},
            // Owner reference: ties the companion's lifecycle to the source.
            // Delete the source -> Kubernetes deletes this automatically.
            OwnerReferences: []metav1.OwnerReference{{
                APIVersion: "v1", Kind: "ConfigMap",
                Name: source.Name, UID: source.UID,
                Controller: boolPtr(true),
            }},
        },
        Data: source.Data,
    }

    // 3. What EXISTS? Make actual match desired.
    existing, err := c.lister.ConfigMaps(ns).Get(companionName)
    if apierrors.IsNotFound(err) {
        _, err = c.client.CoreV1().ConfigMaps(ns).Create(context.TODO(), desired, metav1.CreateOptions{})
        klog.Infof("created companion %s/%s", ns, companionName)
        return err
    }
    if err != nil {
        return err
    }
    // It exists - has the source drifted since we last synced?
    if existing.Annotations[checksumAnn] != sum {
        updated := existing.DeepCopy()
        updated.Data = source.Data
        updated.Annotations[checksumAnn] = sum
        _, err = c.client.CoreV1().ConfigMaps(ns).Update(context.TODO(), updated, metav1.UpdateOptions{})
        klog.Infof("updated companion %s/%s (source changed)", ns, companionName)
        return err
    }
    // Already correct. Reconcile is a no-op. (This is the common case - and
    // it's why running reconcile a thousand times is harmless: idempotent.)
    return nil
}

func checksum(data map[string]string) string {
    h := sha256.New()
    for k, v := range data {
        fmt.Fprintf(h, "%s=%s;", k, v)
    }
    return hex.EncodeToString(h.Sum(nil))[:12]
}

func boolPtr(b bool) *bool { return &b }

Read reconcile against the mental model: get desired (the companion that should exist), get actual (what does exist), make them match - create if missing, update if drifted, do nothing if correct. It never asks "what event brought me here?" It just drives toward desired state. That is a Kubernetes controller. Everything else - Deployments, the scheduler, cert-manager, Argo - is this loop at larger scale.

Step 7: run it and watch reconciliation - the payoff

$ go run .
cache synced, controller running

Now, in another terminal, watch the cluster while you poke it. Open a watch so you see reconciliation live:

$ kubectl get configmaps -w           # -w = watch; leave this running

Create a managed ConfigMap:

$ kubectl create configmap colors --from-literal=sky=blue
$ kubectl label configmap colors workshop.io/managed=true

In the watch terminal, a colors-synced appears within a second - your controller saw the label, reconciled, and created the companion:

NAME            DATA   AGE
colors          1      3s
colors-synced   1      1s        <- your controller created this
$ kubectl get configmap colors-synced -o jsonpath='{.data}{"\n"}'
{"sky":"blue"}                    # it copied the data

Now change the source and watch the copy follow:

$ kubectl patch configmap colors --type merge -p '{"data":{"sky":"orange"}}'
# controller log: updated companion default/colors-synced (source changed)
$ kubectl get configmap colors-synced -o jsonpath='{.data}{"\n"}'
{"sky":"orange"}                  # the copy synced

Step 8: the moment Kubernetes clicks - watch it self-heal

This is the whole workshop in one action. Delete the companion and watch your controller bring it back:

$ kubectl delete configmap colors-synced
configmap "colors-synced" deleted
# controller log, instantly:
#   created companion default/colors-synced
$ kubectl get configmap colors-synced
NAME            DATA   AGE
colors-synced   1      1s         <- it's BACK, 1 second old

You deleted it. It came back. You didn't tell the controller "recreate it" - you deleted the companion, the informer noticed, enqueued the source, reconcile ran, saw the companion was missing, and recreated it. This is exactly why deleting a Pod doesn't work (the Deployment recreates it), why kubectl apply is safe to re-run, why Kubernetes is self-healing. The desired state is the source of truth; the controller relentlessly enforces it. You just built that.

And the owner-reference payoff - delete the source, and the companion is garbage-collected automatically:

$ kubectl delete configmap colors
configmap "colors" deleted
$ kubectl get configmap colors-synced
Error from server (NotFound): configmaps "colors-synced" not found   # auto-cleaned

You didn't write deletion logic for that path - the owner reference told Kubernetes "this companion belongs to that source," and the garbage collector did the rest. Owner references are how every controller manages cleanup.

Step 9: break it - see the failure mode and retry

Watch the workqueue's retry/backoff handle errors. Temporarily make reconcile fail: add return fmt.Errorf("boom") at the top of reconcile, re-run, create a managed ConfigMap. The log shows the backoff:

reconcile default/colors failed, requeueing: boom
reconcile default/colors failed, requeueing: boom    # ~1s later
reconcile default/colors failed, requeueing: boom    # ~2s later (exponential)
reconcile default/colors failed, requeueing: boom    # ~4s later...

AddRateLimited requeues with exponential backoff, so a transiently-failing reconcile retries patiently instead of hot-looping. Remove the boom line and the next retry succeeds and Forgets the key (resetting backoff). This resilience - keep trying, back off, never give up - is why controllers survive flaky APIs and transient errors. It's free from the workqueue.

Step 10: the hardening slice - minimum RBAC

Running locally you used your admin kubeconfig. To run in the cluster, the controller needs a ServiceAccount with the least privilege that works - exactly what it does and no more:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: mirror-controller
rules:
- apiGroups: [""]
  resources: ["configmaps"]
  verbs: ["get", "list", "watch", "create", "update", "delete"]  # exactly our verbs

The discipline: grant the verbs the controller actually calls (we get/list/watch to observe, create/update/delete to reconcile) and nothing else. Over-broad RBAC on a controller is a real security hole - a compromised controller can do whatever its ServiceAccount allows. Minimum privilege is the rule.

Now extend it (the real Week 9 lab)

You've built the core. Extend it into the full lab to cement the pattern:

  1. Cross-namespace mirror. Instead of a companion in the same namespace, mirror managed ConfigMaps into every namespace matching a prefix. You'll watch Namespaces too (a second informer) and enqueue affected sources when a new namespace appears. (Owner references don't cross namespaces - you'll handle cleanup yourself, which teaches why.)
  2. Leader election. Run two replicas; add tools/leaderelection so only one acts (a Lease is the lock). Kill the leader, watch the standby take over - HA controllers.
  3. Metrics + health. Expose queue depth and reconcile duration on /metrics, add /healthz and /readyz. This is what makes a controller operable.
  4. Graduate to controller-runtime. Rebuild this in controller-runtime/kubebuilder (Week 10) and see how much of the boilerplate it hides - now that you know what it's hiding, because you built it by hand.

What you might wonder

"Why client-go directly instead of controller-runtime/kubebuilder?" Because building it raw shows you the machinery - informer, lister, workqueue, reconcile - that every framework hides. controller-runtime is what you'll use in production (it's less code), but if you start there, the reconcile loop is magic. Build it once by hand (this workshop), then let the framework do it (the operator workshop). You can only appreciate what kubebuilder saves you after you've written the boilerplate yourself.

"Why a workqueue? Why not just do the work in the event handler?" Three reasons, all of which you saw: decoupling (handlers stay fast, work happens on workers), retries with backoff (AddRateLimited - Step 9), and deduplication (the same key enqueued repeatedly is processed once). Doing work inline blocks the watch stream and gives you none of this. The workqueue is the controller pattern.

"Is reconcile really called over and over even when nothing changed?" Yes - the 30s resync re-delivers every object, so reconcile runs periodically even with no changes (plus on every event). That's the point: it's level-triggered, so a no-op reconcile (Step 6's "already correct" path) must be cheap and harmless. Idempotency is non-negotiable - reconcile must produce the same result no matter how many times it runs. A controller that isn't idempotent corrupts state on the second run.

"Why read from the lister (cache) instead of the API?" Calling the API in the hot path would hammer the apiserver - a controller watching thousands of objects reconciling constantly would melt it. The informer keeps a local cache; the lister reads from it (microseconds, no network). You only call the API to change things (create/update/delete), never to read in reconcile. This read-from-cache / write-to-API split is a core scaling principle.

"How does this scale to real controllers like Deployments?" Identically. The Deployment controller watches Deployments and ReplicaSets, and reconciles: desired replicas vs actual Pods, create/delete to match. The ReplicaSet controller watches ReplicaSets and Pods. The scheduler watches unscheduled Pods and binds them. Every one is the loop you just built - watch, diff desired vs actual, act, requeue. Once you've built one controller, you understand the entire control plane's shape.

What this gave you

  • You built a real, running Kubernetes controller in Go from raw client-go - informer, lister, workqueue, reconcile.
  • You watched it reconcile: create a source, the companion appears; change the source, the copy follows.
  • You watched it self-heal - deleted the companion, your code brought it back - the moment that makes "declarative, level-triggered, self-healing" concrete instead of a slogan.
  • You used owner references for automatic garbage collection.
  • You saw the workqueue's retry/backoff handle failures, and the read-from-cache / write-to-API scaling split.
  • You know minimum-privilege RBAC for a controller.
  • You understand that every controller in Kubernetes - Deployments, the scheduler, every operator - is this exact loop.

This is the foundation for the rest of the Kubernetes workshops. Next, you'll extend the API itself: build an operator with a Custom Resource Definition, where you define a new kind of Kubernetes object and the controller that gives it meaning.

Back to the Controllers & Operators month for the conceptual frame.

Submit your build

When you finish this workshop, share what you built so others can see and learn from your work. Include:

  • Public repo with your controller code (link it in your submission)
  • Terminal log of the self-heal test - delete the managed ConfigMap, controller recreates it
  • Short note (3 to 5 sentences) on the one thing that clicked for you

Submit your build  Request feedback on your output  Discuss this workshop

Browse the gallery  |  All discussions

Comments