Saltar a contenido

Workshop - Build a GitOps sync loop

DifficultyDeepTime90 min
Needs: Linux or macOS, Go 1.21+, Docker, kind or k3d, git

Before you start:

Launch in KillercodaFree browser-based environment - no install required to follow along.

Companion to Kubernetes -> Month 05 -> Week 17: GitOps (ArgoCD and Flux). The chapter explains GitOps - git as the source of truth, continuously reconciled into the cluster. This workshop has you build a tiny GitOps engine - a loop that pulls manifests from a git repo and applies them, detects drift, and heals it - then watch it revert a manual change you make to the cluster. By the end you'll understand precisely what Argo CD and Flux do, because you'll have built the core in ~80 lines.

~75 minutes. Needs: kind/k3d, Go 1.21+, git, kubectl. Prerequisite: the controller-from-scratch workshop - GitOps is the reconcile loop with git as the desired state.

What you'll build, and the idea it makes concrete

You'll build a controller whose "desired state" isn't a Kubernetes object - it's a git repository. The loop: clone/pull the repo, apply the manifests it contains, and on every tick re-apply so any drift (someone kubectl edit-ed a Deployment, someone deleted a Service) is corrected back to what git says. Then you'll kubectl scale a Deployment by hand and watch your engine revert it within seconds.

The idea this makes concrete:

GitOps is the reconcile loop you already know, with git as the source of truth. Instead of "desired state lives in a Website CR," desired state lives in a git repo, and the controller continuously drives the cluster toward what's committed. The consequences are the whole value proposition: git is the audit log (every change is a commit), rollback is git revert, the cluster self-heals toward git (manual kubectl changes get reverted), and no human ever runs kubectl apply against production - they open a PR. Argo CD and Flux are this loop, industrialized with a UI, multi-repo/multi-cluster support, health checks, and sync waves.

The controller workshop reconciled a CR into objects. This reconciles a git repo into a whole cluster - same loop, different desired-state source, and it's the pattern modern platform teams run everything on.

Step 0: the GitOps model

Fix the shift in thinking. Traditional ("push") deployment: a human or CI runs kubectl apply / helm install to the cluster. GitOps ("pull") deployment: an agent in the cluster continuously pulls from git and reconciles:

PUSH (traditional):                      PULL (GitOps):
  human/CI --kubectl apply--> cluster      git repo <--pull-- agent-in-cluster --apply--> cluster
  - imperative, point-in-time              - declarative, continuous
  - drift accumulates silently             - drift is detected and reverted
  - "what's actually deployed?" = unknown  - "what's deployed?" = what's in git, always
  - rollback = remember the old command    - rollback = git revert
  - audit = who ran what when? = murky      - audit = git history

The agent runs the same reconcile loop as any controller: desired = manifests in git, actual = objects in the cluster, act = apply to make them match, forever. You're building that agent.

Step 1: a git repo of manifests (the source of truth)

Make a local git repo holding the manifests your engine will deploy:

$ mkdir gitops-repo && cd gitops-repo && git init
$ mkdir manifests
$ cat > manifests/app.yaml <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web
spec:
  replicas: 2
  selector: {matchLabels: {app: web}}
  template:
    metadata: {labels: {app: web}}
    spec:
      containers:
      - {name: web, image: nginx:1.27-alpine}
EOF
$ git add . && git commit -m "deploy web at 2 replicas"
$ cd ..

This repo is now the declared truth: "the cluster should run web at 2 replicas." Anyone wanting to change production changes this - via a commit/PR - not the cluster directly.

Step 2: the sync loop

The engine: every N seconds, pull the repo and apply its manifests. Set up the project:

$ mkdir mini-gitops && cd mini-gitops && go mod init workshop/mini-gitops
$ go get k8s.io/client-go@v0.29.3 k8s.io/apimachinery@v0.29.3 k8s.io/cli-runtime@v0.29.3

The core loop (using kubectl apply's server-side-apply semantics via the dynamic client; for the workshop we shell out to kubectl apply for clarity, then note the API path):

package main

import (
    "os/exec"
    "time"

    "k8s.io/klog/v2"
)

const (
    repoURL  = "/path/to/gitops-repo"   // local for the workshop; a real URL in production
    repoDir  = "/tmp/gitops-checkout"
    manifest = "manifests"
    interval = 15 * time.Second
)

func main() {
    klog.Info("mini-gitops starting")
    for {
        if err := syncOnce(); err != nil {
            klog.Errorf("sync failed: %v", err)
        }
        time.Sleep(interval)
    }
}

func syncOnce() error {
    // 1. PULL: get the latest desired state from git.
    if err := pull(); err != nil {
        return err
    }
    // 2. APPLY: drive the cluster toward what git says. apply is idempotent and
    //    declarative - re-applying the same manifests reverts any drift back to git.
    cmd := exec.Command("kubectl", "apply", "-f", repoDir+"/"+manifest, "--prune",
        "-l", "gitops.workshop.io/managed=true")
    out, err := cmd.CombinedOutput()
    klog.Infof("apply:\n%s", out)
    return err
}

func pull() error {
    // clone if missing, else fetch+reset to origin (git is the truth, discard local)
    if _, err := exec.Command("test", "-d", repoDir+"/.git").Output(); err != nil {
        return exec.Command("git", "clone", repoURL, repoDir).Run()
    }
    if err := exec.Command("git", "-C", repoDir, "fetch", "origin").Run(); err != nil {
        return err
    }
    return exec.Command("git", "-C", repoDir, "reset", "--hard", "origin/HEAD").Run()
}

Two design choices that are GitOps: - git reset --hard origin/HEAD - the local checkout is disposable; git is the truth. Never trust local state. - kubectl apply --prune - apply also deletes cluster objects that are no longer in git (matched by the label). Without prune, removing a manifest from git wouldn't remove it from the cluster - git wouldn't be the complete truth. Prune makes the cluster exactly mirror git: present in git -> exists; absent from git -> deleted.

(Add the label gitops.workshop.io/managed=true to your manifests so prune knows what it owns - or use Argo/Flux's ownership tracking in production.)

Step 3: run it and watch git become the cluster

$ go run .
mini-gitops starting
apply:
deployment.apps/web created

Your engine pulled the repo and created the Deployment. The cluster now matches git:

$ kubectl get deployment web
NAME   READY   UP-TO-DATE   AVAILABLE   AGE
web    2/2     2            2           10s        <- 2 replicas, as git declares

You didn't run kubectl apply - your engine did, from git. This is the GitOps inversion: the deploy happened because the repo said so, pulled by an in-cluster agent.

Step 4: the payoff - watch it heal drift

This is the moment GitOps clicks. Manually change the cluster, as a panicked engineer might at 3 AM:

$ kubectl scale deployment web --replicas=5
deployment.apps/web scaled
$ kubectl get deployment web
NAME   READY   UP-TO-DATE   AVAILABLE
web    5/5     5            5            <- you scaled it to 5

Now wait one sync interval (~15s) and watch your engine's log and the cluster:

# engine log:
apply:
deployment.apps/web configured        <- it noticed drift and re-applied git's spec

$ kubectl get deployment web
NAME   READY   UP-TO-DATE   AVAILABLE
web    2/2     2            2            <- back to 2. Your manual change was REVERTED.

Your manual kubectl scale was undone, because git says 2 and the engine relentlessly enforces git. This is GitOps's superpower and its discipline: the cluster cannot drift from git, because anything not in git gets reverted. The only way to make web run 5 replicas is to commit that change:

$ cd gitops-repo
$ sed -i 's/replicas: 2/replicas: 5/' manifests/app.yaml
$ git commit -am "scale web to 5"
$ cd -
# wait one interval; engine log: deployment.apps/web configured
$ kubectl get deployment web
NAME   READY   UP-TO-DATE   AVAILABLE
web    5/5     5            5            <- now it stays at 5, because GIT says 5

The change stuck because it's in git. This is the entire GitOps contract: the cluster is a function of the repo. Want to change production? Commit. Want to roll back? Revert the commit and watch the engine restore the previous state. Want to know what's deployed? Read the repo. Want an audit trail? It's the git log.

Step 5: rollback is git revert

Watch the rollback story - the operational win that sells GitOps to teams:

$ cd gitops-repo
$ git revert --no-edit HEAD          # undo the "scale to 5" commit
$ cd -
# wait one interval
$ kubectl get deployment web
NAME   READY   AVAILABLE
web    2/2     2                       <- reverted to 2, because git reverted

Rolling back a production change was git revert. No remembering the old kubectl command, no "what was the previous image tag?" - the previous state is in git history, and reverting the commit makes the engine restore it. This is why GitOps teams sleep better: every change and every rollback is a reviewable, auditable git operation.

Step 6: break it - delete a managed object

The other half of self-healing - deletion drift:

$ kubectl delete deployment web
deployment.apps "web" deleted
# wait one interval; engine log: deployment.apps/web created
$ kubectl get deployment web
NAME   READY   AVAILABLE
web    2/2     2                       <- recreated, because git says it should exist

Deleting a git-managed object just makes the next sync recreate it - same self-healing as the controller pilot, now at the level of "the whole cluster mirrors the repo." To actually remove web, delete its manifest from git and let prune remove it:

$ cd gitops-repo && git rm manifests/app.yaml && git commit -m "remove web" && cd -
# next sync, with --prune: deployment.apps/web pruned (deleted)

Present in git -> exists. Absent from git -> deleted. The cluster is exactly the repo.

Now extend it

  1. Use the API, not kubectl. Replace the shell-out with server-side apply via the dynamic client (dynamicClient.Resource(gvr).Apply(...)). This is how real GitOps engines work - no kubectl subprocess.
  2. Status + health. Report sync status (last synced commit, in-sync vs drifted) as a CR or metrics, so you can see "is the cluster in sync with git?" at a glance - Argo's core UI feature.
  3. Webhook-triggered sync. Instead of polling every 15s, sync on a git webhook (push -> immediate reconcile), with polling as a fallback. Faster, less API churn.
  4. Then deploy Argo CD or Flux and recognize every concept: the Application/Kustomization CR is your repo+path config, "OutOfSync" is your drift detection, "Sync" is your apply, "auto-prune" is your --prune, "self-heal" is your revert-drift loop. You built their core; now you understand their knobs.

What you might wonder

"How is this different from running kubectl apply in CI?" CI apply is push and point-in-time: it applies once, then the cluster can drift freely until the next pipeline run, and CI needs cluster credentials. GitOps is pull and continuous: an in-cluster agent reconciles constantly (drift is reverted in seconds, not at the next deploy), the cluster pulls (no external system holds cluster creds), and the desired state is always exactly git. The continuous-reconcile + drift-revert is the difference, and it's why GitOps beats "apply in CI."

"What does --prune actually do, and why is it scary?" It deletes cluster objects that the engine manages (by label) but that are no longer in git - making the cluster exactly mirror git. It's essential (otherwise removing a manifest doesn't remove the object) but dangerous (a bad label selector or an accidental git rm can delete production). This is why Argo/Flux make auto-prune opt-in and track ownership carefully. Powerful, handle deliberately.

"Argo CD vs Flux - and do I need them if I built this?" Yes, use them in production - your 80 lines lack multi-repo/multi-cluster, a UI, health assessment, sync waves/hooks, RBAC, drift visualization, and battle-testing. Argo CD is UI-centric and app-centric; Flux is CRD/Kustomize-centric and composable. Building this workshop is to understand them - so "OutOfSync," "auto-sync," "self-heal," and "prune" are concepts you've implemented, not buzzwords.

"What about secrets? You can't commit those to git." The real wrinkle. You commit encrypted secrets (Sealed Secrets, SOPS) or reference an external secret store (External Secrets Operator pulling from Vault/cloud secret managers). GitOps + plaintext secrets would put credentials in git history - never do that. Secret management is the part of GitOps that needs a deliberate companion tool.

"Does the cluster reverting my manual changes ever get in the way?" During an incident you might need to make a fast manual change - and GitOps will revert it. The answer is either "make the fix in git (it's fast to commit)" or temporarily disable auto-sync for that app, fix manually, then reconcile git to match. The discipline (all changes via git) is the point, but mature tools give you an escape hatch for emergencies.

What this gave you

  • You built a GitOps engine: pull a repo, apply it, continuously - the reconcile loop with git as desired state.
  • You watched git become the cluster (deploy happened because the repo said so, via an in-cluster agent).
  • You watched it heal drift - a manual kubectl scale reverted within seconds because git said otherwise.
  • You changed production the GitOps way (commit), rolled back the GitOps way (git revert), and removed via prune.
  • You understand the model: the cluster is a function of the repo - git is the audit log, rollback, and source of truth.
  • You can map every Argo CD / Flux concept (OutOfSync, sync, self-heal, prune) onto what you built, and you know the secrets caveat.

Next: the autoscaling layer - build an HPA-like controller that watches a metric and scales a Deployment, and watch it react to load.

Back to the Platform & Day-2 month.

Submit your build

When you finish this workshop, share what you built so others can see and learn from your work. Include:

  • Public repo with your GitOps engine code
  • Demo of drift heal - manually scale a Deployment, watch the engine revert it
  • Terminal log of a prune pass - a resource removed from git is deleted from the cluster
  • Note on how you handle resources you do NOT own (prune scoping)

Submit your build  Request feedback on your output  Discuss this workshop

Browse the gallery  |  All discussions

Comments