Skip to content

Workshop - Build an admission webhook

DifficultyDeepTime90 min
Needs: Linux or macOS, Go 1.21+, Docker, kind or k3d, openssl

Before you start:

Launch in KillercodaFree browser-based environment - no install required to follow along.

Companion to Kubernetes -> Month 05 -> Week 20: Admission Control (Webhooks, OPA Gatekeeper, Kyverno). The chapter explains that admission controllers can reject or rewrite objects before they're stored. This workshop has you build both kinds - a validating webhook that blocks bad objects and a mutating webhook that rewrites them - and watch the cluster enforce your rules at kubectl apply time. By the end you'll understand the API request path that every policy tool (Gatekeeper, Kyverno, the Pod Security admission) plugs into.

~90 minutes. Needs: kind/k3d, Go 1.21+, kubectl. Prerequisite: the controller-from-scratch workshop for the basic client/cluster mechanics.

What you'll build, and the idea it makes concrete

You'll build a webhook server that does two things to every Pod as it's created: - Validating: reject any Pod that has no resource limits (a real production guardrail). - Mutating: auto-inject a default label and a default securityContext onto Pods that lack them.

The idea this makes concrete:

Every write to the Kubernetes API passes through an admission chain before it's persisted to etcd. Admission webhooks are your hook into that chain: the apiserver calls your HTTP endpoint with the object, and you reply "allow," "deny (with a reason)," or "allow, but here's a patch." This is synchronous and in-band - your webhook runs between kubectl apply and the object existing. It's how Pod Security, Istio sidecar injection, Gatekeeper policies, and Kyverno all work.

The controller workshop showed you the reconcile path - reacting after objects exist. This shows you the admission path - intervening before they exist. Two different points in an object's life, two different powers.

Step 0: where admission sits in the request path

Fix the mental model first. When you kubectl apply a Pod, the apiserver runs it through a pipeline before storing it:

kubectl apply
   |
   v
apiserver:  authentication -> authorization (RBAC) -> MUTATING admission -> schema validation -> VALIDATING admission -> etcd
                                                       ^^^^^^^^^^^^^^^^^                          ^^^^^^^^^^^^^^^^^^^
                                                       your mutating webhook                      your validating webhook
                                                       (can PATCH the object)                     (can only ALLOW/DENY)

Two key facts that explain everything: - Mutating runs before validating. So a mutating webhook can add the resource limits, and then the validating webhook (or schema) sees the mutated object. Order matters. - It's before etcd. A rejected object is never stored - it doesn't exist, there's nothing to clean up. This is fundamentally different from a controller, which acts on objects that already exist. Admission is prevention; controllers are correction.

You're going to build endpoints that the apiserver calls at those two arrows.

Step 1: cluster + project

$ kind create cluster --name webhook-workshop
$ mkdir admission-webhook && cd admission-webhook
$ go mod init workshop/admission-webhook
$ go get k8s.io/api@v0.29.3 k8s.io/apimachinery@v0.29.3

A webhook is just an HTTPS server that speaks the AdmissionReview protocol - you don't even need client-go for the core. The apiserver POSTs an AdmissionReview (containing the object), and you return an AdmissionReview (containing the verdict).

Step 2: the validating webhook - reject Pods without limits

The heart of a validating webhook: read the incoming object, decide allow/deny, respond. Create main.go:

package main

import (
    "encoding/json"
    "fmt"
    "net/http"

    admissionv1 "k8s.io/api/admission/v1"
    corev1 "k8s.io/api/core/v1"
    metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)

// validate rejects Pods whose containers lack resource limits.
func validate(w http.ResponseWriter, r *http.Request) {
    review := readReview(r)
    req := review.Request

    var pod corev1.Pod
    json.Unmarshal(req.Object.Raw, &pod)   // the object being created

    // The decision logic: every container must declare CPU + memory limits.
    allowed, reason := true, ""
    for _, c := range pod.Spec.Containers {
        if c.Resources.Limits.Cpu().IsZero() || c.Resources.Limits.Memory().IsZero() {
            allowed = false
            reason = fmt.Sprintf("container %q must set cpu and memory limits", c.Name)
            break
        }
    }

    // Build the response: allow, or deny with a human-readable reason.
    resp := &admissionv1.AdmissionResponse{UID: req.UID, Allowed: allowed}
    if !allowed {
        resp.Result = &metav1.Status{Message: reason}  // shown to the user at apply time
    }
    writeReview(w, review, resp)
}

The contract is simple: Allowed: true lets it through; Allowed: false with a Result.Message blocks it and shows the user why. That message is what appears in the kubectl apply error - so make it actionable.

The plumbing (readReview/writeReview) handles the AdmissionReview envelope:

func readReview(r *http.Request) *admissionv1.AdmissionReview {
    var review admissionv1.AdmissionReview
    body, _ := io.ReadAll(r.Body)
    json.Unmarshal(body, &review)
    return &review
}

func writeReview(w http.ResponseWriter, review *admissionv1.AdmissionReview, resp *admissionv1.AdmissionResponse) {
    review.Response = resp
    out, _ := json.Marshal(review)
    w.Header().Set("Content-Type", "application/json")
    w.Write(out)
}

Step 3: the mutating webhook - inject defaults via a JSON patch

A mutating webhook returns the same allow/deny, plus an optional JSON Patch (RFC 6902) that the apiserver applies to the object. Add to main.go:

// mutate injects a default label and securityContext onto Pods that lack them.
func mutate(w http.ResponseWriter, r *http.Request) {
    review := readReview(r)
    req := review.Request

    var pod corev1.Pod
    json.Unmarshal(req.Object.Raw, &pod)

    // Build a JSON Patch: a list of operations the apiserver will apply.
    var patches []map[string]interface{}

    // Add a label if missing.
    if pod.Labels == nil {
        patches = append(patches, map[string]interface{}{
            "op": "add", "path": "/metadata/labels",
            "value": map[string]string{"workshop.io/injected": "true"},
        })
    } else if _, ok := pod.Labels["workshop.io/injected"]; !ok {
        patches = append(patches, map[string]interface{}{
            "op": "add", "path": "/metadata/labels/workshop.io~1injected",  // ~1 escapes "/"
            "value": "true",
        })
    }

    // Force runAsNonRoot on the pod securityContext if unset.
    if pod.Spec.SecurityContext == nil || pod.Spec.SecurityContext.RunAsNonRoot == nil {
        patches = append(patches, map[string]interface{}{
            "op": "add", "path": "/spec/securityContext",
            "value": map[string]interface{}{"runAsNonRoot": true},
        })
    }

    resp := &admissionv1.AdmissionResponse{UID: req.UID, Allowed: true}
    if len(patches) > 0 {
        patchBytes, _ := json.Marshal(patches)
        pt := admissionv1.PatchTypeJSONPatch
        resp.Patch = patchBytes        // the apiserver applies this to the object
        resp.PatchType = &pt
    }
    writeReview(w, review, resp)
}

func main() {
    http.HandleFunc("/validate", validate)
    http.HandleFunc("/mutate", mutate)
    // The apiserver requires HTTPS - serve with a cert (Step 4 generates it).
    http.ListenAndServeTLS(":8443", "/certs/tls.crt", "/certs/tls.key", nil)
}

The mutation is expressed as a patch, not a modified object - you tell the apiserver "add this field," and it applies it. This is exactly how Istio injects its sidecar container into your Pods and how the Pod Security admission adds defaults: a mutating webhook returning a patch.

Step 4: the TLS requirement (and why webhooks are fiddly)

The apiserver only calls webhooks over HTTPS, and it must trust the webhook's certificate. This is the part that makes admission webhooks notoriously finicky - you need a cert whose CA the apiserver is told to trust. For the workshop, generate a self-signed CA + serving cert:

$ # generate CA + server cert for the service DNS name the apiserver will call
$ ./gen-certs.sh admission-webhook.default.svc   # (a short openssl script; see note)

The serving cert's SAN must match the in-cluster Service DNS name (<service>.<namespace>.svc) the apiserver dials. The CA bundle gets embedded in the webhook configuration (Step 5) so the apiserver trusts it. In production you'd use cert-manager to issue and rotate these automatically (the operator workshop's ecosystem) - manual certs are the #1 source of "webhook not working" pain, which is why everyone uses cert-manager for it.

Deploy the webhook server as a Deployment + Service in the cluster (mounting the cert as a Secret):

$ kubectl create secret tls webhook-certs --cert=tls.crt --key=tls.key
$ kubectl apply -f webhook-deployment.yaml   # Deployment + Service exposing :8443

Step 5: register the webhooks - tell the apiserver to call you

The apiserver doesn't know about your webhook until you register it with a ValidatingWebhookConfiguration / MutatingWebhookConfiguration. This is where you declare what to intercept:

apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingWebhookConfiguration
metadata:
  name: require-limits
webhooks:
- name: require-limits.workshop.io
  rules:                              # WHAT to intercept
  - apiGroups: [""]
    apiVersions: ["v1"]
    operations: ["CREATE"]
    resources: ["pods"]              # only Pod creations
  clientConfig:                       # WHERE to send the request
    service:
      name: admission-webhook
      namespace: default
      path: /validate
    caBundle: <base64 CA cert>       # so the apiserver trusts the webhook's TLS
  admissionReviewVersions: ["v1"]
  sideEffects: None
  failurePolicy: Fail                # if the webhook is down, DENY (fail closed)

Two fields with big consequences: - rules scope what you intercept (here: Pod CREATE). Too broad and you intercept (and can break) everything; scope tightly. - failurePolicy decides what happens when your webhook is unavailable: Fail (deny - fail closed, safe but can wedge the cluster if your webhook crashes) or Ignore (allow - fail open, never blocks but skips your policy). This choice has taken down clusters: a Fail-policy webhook whose backend died blocks all Pod creation, including the webhook's own replacement Pods - a deadlock. Choose deliberately, and exclude the kube-system namespace.

Register both (the mutating one points at /mutate).

Step 6: watch the cluster enforce your rules

The payoff - your policy is now live in the API path. Try to create a Pod without limits:

$ kubectl run bad --image=nginx
Error from server: admission webhook "require-limits.workshop.io" denied the request:
container "bad" must set cpu and memory limits

Rejected, with your exact message, before the Pod ever existed. kubectl get pod bad shows nothing - it was never stored. That's admission: prevention, not correction. Now create one with limits:

$ kubectl apply -f - <<EOF
apiVersion: v1
kind: Pod
metadata: {name: good}
spec:
  containers:
  - name: good
    image: nginx
    resources:
      limits: {cpu: "100m", memory: "128Mi"}
EOF
pod/good created                     # passed validation

And watch the mutating webhook's injection - you didn't set a label or securityContext, but they're there:

$ kubectl get pod good -o jsonpath='{.metadata.labels}{"\n"}'
{"workshop.io/injected":"true"}      # your mutating webhook added this

$ kubectl get pod good -o jsonpath='{.spec.securityContext}{"\n"}'
{"runAsNonRoot":true}                # injected too

You wrote a bare Pod; the apiserver routed it through your mutating webhook (which patched in the defaults), then your validating webhook (which checked limits), then stored it. This is exactly the path Istio's sidecar injection, Pod Security defaults, and Gatekeeper policies travel. You just built two stops on it.

Step 7: break it - the failurePolicy lesson, live

See why failurePolicy matters. Scale your webhook Deployment to zero (simulate it crashing):

$ kubectl scale deployment admission-webhook --replicas=0
$ kubectl run another --image=nginx --requests='cpu=100m'   # try to create any pod
Error from server: ... failed calling webhook "require-limits.workshop.io":
connect: connection refused

With failurePolicy: Fail, the webhook being down blocks all Pod creation - the apiserver can't reach your endpoint, so it denies. This is the deadlock that takes down clusters: a fail-closed webhook whose pods died can't be replaced (the replacement pods are themselves blocked). Switch to failurePolicy: Ignore and the same kubectl run succeeds (your policy is skipped, but the cluster keeps working). Restore replicas to fix it. This single experiment teaches the most important operational fact about webhooks - and why production webhooks scope tightly, exclude kube-system, and think hard about fail-open vs fail-closed.

Now extend it

  1. namespaceSelector / objectSelector. Scope the webhook to only namespaces labeled policy=enforced, so you can roll it out gradually and never touch kube-system. The real-world safe-rollout pattern.
  2. Validate more. Reject latest image tags, require a team label, forbid hostNetwork. Each is a one-liner in your decision logic - and a real production guardrail.
  3. Do it with controller-runtime. Rebuild using controller-runtime's webhook framework (the operator workshop's toolkit), which handles the AdmissionReview plumbing and integrates with cert-manager for certs.
  4. Compare to policy engines. Express the same "require limits" rule in Kyverno (YAML, no code) and OPA Gatekeeper (Rego). Now you understand what they generate under the hood - they're admission webhooks with a policy language on top.

What you might wonder

"Webhook vs controller - when do I use which?" Admission webhook = intervene before an object is stored (reject bad input, inject defaults) - synchronous, in the request path, prevention. Controller = act after objects exist (reconcile to desired state) - asynchronous, correction. Use a webhook to enforce policy at write time ("no Pods without limits"); use a controller to maintain state over time ("keep 3 replicas running"). Many systems use both.

"Should I write webhooks, or use Kyverno/Gatekeeper?" For most policy needs, use Kyverno (YAML policies) or Gatekeeper (Rego) - they're battle-tested webhooks with a policy language, cert management, and audit built in. Write a custom webhook only when you need logic a policy engine can't express, or tight integration with your own types. Building one here (this workshop) is to understand what those tools are - so you can debug them when they misbehave and know when to reach past them.

"Why is mutating before validating?" So mutation can fix things up before validation judges them. A mutating webhook injects default limits; the validating webhook (or schema) then sees a Pod with limits and passes it. If validation ran first, it would reject the Pod before mutation could fix it. The order is deliberate and you must design with it in mind.

"What's the cert pain really about?" The apiserver must trust the webhook's TLS cert, and the cert must match the service DNS name. Manual certs expire and break webhooks silently (a webhook with an expired cert + failurePolicy: Fail = wedged cluster). This is why cert-manager (auto-issue + auto-rotate) is universal for webhooks in production. The fiddliness is real; the fix is automation.

"Can a webhook see the user who made the request?" Yes - the AdmissionRequest includes UserInfo (username, groups). You can write policy like "only the platform team may create privileged Pods." This is how webhooks enforce rules RBAC can't express (RBAC is verb-on-resource; webhooks can inspect the object's content and the user together).

What this gave you

  • You know the API request path and where admission sits: before etcd, mutating then validating.
  • You built a validating webhook that rejects Pods without limits, with an actionable error shown at apply time.
  • You built a mutating webhook that injects defaults via a JSON patch - the same mechanism as Istio sidecar injection.
  • You registered both with the apiserver and watched them enforce policy at kubectl apply time.
  • You hit the TLS/cert requirement and know why cert-manager is universal for webhooks.
  • You broke it with failurePolicy: Fail and learned the fail-open/fail-closed lesson that takes down real clusters.
  • You understand that Kyverno, Gatekeeper, Pod Security, and sidecar injection are all this - admission webhooks - and when to build your own vs use them.

Next: decide where Pods run - build a custom scheduler and watch your placement logic bind Pods to nodes.

Back to the Platform & Day-2 month.

Submit your build

When you finish this workshop, share what you built so others can see and learn from your work. Include:

  • Public repo with your validating and mutating webhook code and certs setup
  • Terminal log of a Pod rejected by your validator (missing resource limits)
  • Terminal log of a Pod whose spec was rewritten by your mutator (injected default)
  • Note on what happened when you set failurePolicy=Fail and the webhook was down

Submit your build  Request feedback on your output  Discuss this workshop

Browse the gallery  |  All discussions

Comments