Capstone Projects-Three Tracks, One Choice¶
Pick one. The work performed here is what you describe in interviews.
Track 1-Hard Way: A Production-Grade Cluster From Scratch¶
Outcome: a multi-node Kubernetes cluster brought up on bare metal or cloud VMs, with mTLS-everywhere, fine-grained RBAC, multi-tenancy, GitOps-managed workloads, and a documented operational runbook.
Functional spec¶
- 3 control-plane nodes + 3 workers (minimum). HAProxy or cloud LB in front of the apiservers.
- etcd with mTLS, encryption-at-rest, scheduled snapshots to off-cluster storage.
- CNI: Cilium with kube-proxy replacement, Hubble enabled.
- Service mesh: Istio (ambient) or Linkerd, mTLS strict between services.
- Storage: a real CSI driver (local-path for dev; OpenEBS / Longhorn / cloud CSI for "real").
- Observability: Prometheus + Grafana + Loki + Tempo + OTel Collector.
- GitOps: ArgoCD or Flux managing platform addons.
- Policy: Pod Security
restricted, NetworkPolicy default-deny, Kyverno or Gatekeeper enforcing org rules. - Backup: Velero scheduled, restore tested.
Non-functional spec¶
- CIS benchmark green (
kube-bench≥90% pass). - Cluster rebuild from scratch in <2 hours via Ansible/Terraform.
- Zero-downtime kubelet upgrades (drain + replace pattern).
- A demo: kill a control-plane node; cluster recovers without intervention.
Acceptance¶
- Public repo with provisioning playbooks and runbooks.
- A 30-minute screencast walking the assessor through bring-up, an incident drill, and a tenant onboarding.
- A
RUNBOOK.mdcovering: cluster provisioning, node addition/removal, etcd backup/restore, certificate rotation, upgrade procedure, top 5 incident types and remediation.
Skills exercised¶
- All months-but Months 1, 2, 6 most heavily.
Track 2-Platform: GitOps Multi-Tenant PaaS¶
Outcome: a self-service developer platform built on Kubernetes that demonstrates onboarding a new team in <30 minutes, with policy guardrails, infra-from-code, and full observability.
Functional spec¶
- Tenant model: each tenant gets a Namespace, ResourceQuota, LimitRange, RBAC bindings, default NetworkPolicy, monitoring scrape config, GitOps Application (Argo) entry-all materialized from a single tenant claim (Crossplane Composition or Helm chart).
- Self-service: developers commit a
manifest.yamlto their repo; ArgoCD/Flux picks it up; their app deploys. - Policy: Kyverno or OPA Gatekeeper enforces: image signatures, no
latesttags, mandatory labels, resource limits, no privileged Pods. - Observability: each tenant's metrics/logs/traces are isolated (via labels and Loki/Prom multi-tenancy); a per-tenant Grafana folder with default dashboards.
- Cost attribution: OpenCost emits per-tenant cost; surface in a dashboard.
- Crossplane: a
Databaseclaim that materializes a real cloud database (or, for demo, a chart-deployed Postgres).
Non-functional spec¶
- Tenant onboarding: from "claim PR opened" to "deployed app reachable" in <30 minutes (target: <5).
- Failure isolation: a tenant exceeding quota does not affect other tenants.
- Compliance: every running Pod can be traced back to a git commit + signature verification.
Acceptance¶
- Public repo with platform manifests + tenant-onboarding template.
- Live demo: onboard a fresh tenant; deploy a sample app; demonstrate observability + policy denial; demonstrate quota enforcement.
- A
PLATFORM.mddescribing the contract between platform team and tenants: versioning, deprecation, support, escalation.
Skills exercised¶
- Months 3 (operators / Crossplane), 5 (GitOps + IaC + autoscaling + admission), 6 (multi-tenancy).
Track 3-Operator: Production-Quality Operator From Scratch¶
Outcome: a non-trivial operator that manages a stateful application or external system, complete with backup/restore, upgrades, observability, and a thoughtful API.
Suggested scopes¶
elasticsearch-mini-operator: manage Elasticsearch clusters with auto-scaling, snapshot lifecycle, index lifecycle policies.postgres-mini-operator: with automatic failover (using the Postgres replication primitives), backup/restore via WAL-G to S3, point-in-time recovery.saas-resource-operator: manage external SaaS resources via Crossplane composition (e.g., aGitHubRepooperator complete with branch protection, secret scanning, codeowners).
Acceptance¶
- Public repo, written with
controller-runtime+ Kubebuilder. - CRDs versioned (v1alpha1 + v1beta1 + conversion webhook).
- Status conditions, finalizers, owner references-all idiomatic.
- Comprehensive RBAC (least-privilege, generated from kubebuilder markers).
- Mutating + validating admission webhooks.
- E2E tests (Ginkgo + envtest, plus a kind-based suite).
- Helm chart or kustomize manifests for installation.
- Observability: Prometheus metrics, structured logs (logr), OTel traces.
- Helm-test-style upgrade tests across three operator versions.
- Documentation: design rationale, API reference, examples.
Skills exercised¶
- Months 3 (operators), 4 (storage if stateful), 5 (admission), 6 (defense).
Cross-Track Requirements¶
cluster-baseline/template (Appendix A) integrated.- ADRs (≥3).
THREAT_MODEL.md.RUNBOOK.md.- Defense readiness: 60-minute walkthrough.
The track choice signals career direction: Track 1 for SRE/cluster-operator roles, Track 2 for platform-engineering roles, Track 3 for software-engineering-on-Kubernetes roles. Pick based on where you want the next interview loop.