Skip to content

Capstone Projects-Three Tracks, One Choice

Pick one. The work performed here is what you describe in interviews.


Track 1-Hard Way: A Production-Grade Cluster From Scratch

Outcome: a multi-node Kubernetes cluster brought up on bare metal or cloud VMs, with mTLS-everywhere, fine-grained RBAC, multi-tenancy, GitOps-managed workloads, and a documented operational runbook.

Functional spec

  • 3 control-plane nodes + 3 workers (minimum). HAProxy or cloud LB in front of the apiservers.
  • etcd with mTLS, encryption-at-rest, scheduled snapshots to off-cluster storage.
  • CNI: Cilium with kube-proxy replacement, Hubble enabled.
  • Service mesh: Istio (ambient) or Linkerd, mTLS strict between services.
  • Storage: a real CSI driver (local-path for dev; OpenEBS / Longhorn / cloud CSI for "real").
  • Observability: Prometheus + Grafana + Loki + Tempo + OTel Collector.
  • GitOps: ArgoCD or Flux managing platform addons.
  • Policy: Pod Security restricted, NetworkPolicy default-deny, Kyverno or Gatekeeper enforcing org rules.
  • Backup: Velero scheduled, restore tested.

Non-functional spec

  • CIS benchmark green (kube-bench ≥90% pass).
  • Cluster rebuild from scratch in <2 hours via Ansible/Terraform.
  • Zero-downtime kubelet upgrades (drain + replace pattern).
  • A demo: kill a control-plane node; cluster recovers without intervention.

Acceptance

  • Public repo with provisioning playbooks and runbooks.
  • A 30-minute screencast walking the assessor through bring-up, an incident drill, and a tenant onboarding.
  • A RUNBOOK.md covering: cluster provisioning, node addition/removal, etcd backup/restore, certificate rotation, upgrade procedure, top 5 incident types and remediation.

Skills exercised

  • All months-but Months 1, 2, 6 most heavily.

Track 2-Platform: GitOps Multi-Tenant PaaS

Outcome: a self-service developer platform built on Kubernetes that demonstrates onboarding a new team in <30 minutes, with policy guardrails, infra-from-code, and full observability.

Functional spec

  • Tenant model: each tenant gets a Namespace, ResourceQuota, LimitRange, RBAC bindings, default NetworkPolicy, monitoring scrape config, GitOps Application (Argo) entry-all materialized from a single tenant claim (Crossplane Composition or Helm chart).
  • Self-service: developers commit a manifest.yaml to their repo; ArgoCD/Flux picks it up; their app deploys.
  • Policy: Kyverno or OPA Gatekeeper enforces: image signatures, no latest tags, mandatory labels, resource limits, no privileged Pods.
  • Observability: each tenant's metrics/logs/traces are isolated (via labels and Loki/Prom multi-tenancy); a per-tenant Grafana folder with default dashboards.
  • Cost attribution: OpenCost emits per-tenant cost; surface in a dashboard.
  • Crossplane: a Database claim that materializes a real cloud database (or, for demo, a chart-deployed Postgres).

Non-functional spec

  • Tenant onboarding: from "claim PR opened" to "deployed app reachable" in <30 minutes (target: <5).
  • Failure isolation: a tenant exceeding quota does not affect other tenants.
  • Compliance: every running Pod can be traced back to a git commit + signature verification.

Acceptance

  • Public repo with platform manifests + tenant-onboarding template.
  • Live demo: onboard a fresh tenant; deploy a sample app; demonstrate observability + policy denial; demonstrate quota enforcement.
  • A PLATFORM.md describing the contract between platform team and tenants: versioning, deprecation, support, escalation.

Skills exercised

  • Months 3 (operators / Crossplane), 5 (GitOps + IaC + autoscaling + admission), 6 (multi-tenancy).

Track 3-Operator: Production-Quality Operator From Scratch

Outcome: a non-trivial operator that manages a stateful application or external system, complete with backup/restore, upgrades, observability, and a thoughtful API.

Suggested scopes

  1. elasticsearch-mini-operator: manage Elasticsearch clusters with auto-scaling, snapshot lifecycle, index lifecycle policies.
  2. postgres-mini-operator: with automatic failover (using the Postgres replication primitives), backup/restore via WAL-G to S3, point-in-time recovery.
  3. saas-resource-operator: manage external SaaS resources via Crossplane composition (e.g., a GitHubRepo operator complete with branch protection, secret scanning, codeowners).

Acceptance

  • Public repo, written with controller-runtime + Kubebuilder.
  • CRDs versioned (v1alpha1 + v1beta1 + conversion webhook).
  • Status conditions, finalizers, owner references-all idiomatic.
  • Comprehensive RBAC (least-privilege, generated from kubebuilder markers).
  • Mutating + validating admission webhooks.
  • E2E tests (Ginkgo + envtest, plus a kind-based suite).
  • Helm chart or kustomize manifests for installation.
  • Observability: Prometheus metrics, structured logs (logr), OTel traces.
  • Helm-test-style upgrade tests across three operator versions.
  • Documentation: design rationale, API reference, examples.

Skills exercised

  • Months 3 (operators), 4 (storage if stateful), 5 (admission), 6 (defense).

Cross-Track Requirements

  • cluster-baseline/ template (Appendix A) integrated.
  • ADRs (≥3).
  • THREAT_MODEL.md.
  • RUNBOOK.md.
  • Defense readiness: 60-minute walkthrough.

The track choice signals career direction: Track 1 for SRE/cluster-operator roles, Track 2 for platform-engineering roles, Track 3 for software-engineering-on-Kubernetes roles. Pick based on where you want the next interview loop.

Comments