Appendix A-Kubernetes Hardening Reference¶
Cumulative hardening checklist. By week 24 the reader's cluster-baseline/ template should encode every section.
A.1 Control Plane¶
- etcd: 3 or 5 nodes, mTLS, encryption-at-rest, snapshot+restore tested.
- kube-apiserver: encryption providers, audit logging, NodeRestriction admission, PodSecurity admission, OIDC (or trustedSA) for users.
- kube-scheduler: leader election; default + custom plugins reviewed.
- kube-controller-manager: leader election; minimum SA permissions.
- kubelet: read-only port disabled, TLS bootstrap with CSR approval, anonymous-auth false, authorization webhook.
A.2 RBAC¶
- No bindings to the
cluster-adminClusterRole except for break-glass. - Per-tenant Roles, not ClusterRoles.
- Audit
system:authenticatedandsystem:unauthenticatedgroup bindings-both should be empty. - Use
kubectl auth can-i --as=...to verify least-privilege per persona.
A.3 Pod Security¶
- PodSecurity admission
restrictedeverywhere by default. - Exceptions documented in code (namespace labels) with justification.
- Pod-level:
runAsNonRoot,readOnlyRootFilesystem, drop all caps, seccompRuntimeDefault. - Mutating webhook to inject defaults if Pod spec omits them.
A.4 Network¶
- CNI with NetworkPolicy support (Cilium, Calico).
- Default-deny ingress + egress in every namespace.
- Allowed flows declared per workload as labeled NetworkPolicy.
- L7 policy on ingress (Cilium L7 NetworkPolicy or service mesh).
- mTLS between Services (mesh).
- Egress controls: explicit allowed CIDRs / FQDNs.
A.5 Image Supply Chain¶
- Image admission (Kyverno / Cosign policy-controller) requires signature.
- Allowlisted registries.
- No
latesttags; pin by digest in production. - SBOM and SLSA provenance attestations attached to every image.
A.6 Secrets¶
- etcd encryption-at-rest with rotated keys.
- External Secret Operator (ESO) for cloud-KMS-sourced secrets.
- No secrets in env vars where possible (use volume mounts, watch for restart).
- No secrets committed to git, even in sealed form, without
sealed-secrets/sopsratchet.
A.7 Multi-Tenancy¶
- One namespace per tenant; ResourceQuota + LimitRange.
- Hierarchical Namespaces or Capsule for nested tenants.
- PriorityClasses by tier; preemption tuned.
- Per-tenant cost attribution via labels + OpenCost.
A.8 Observability¶
- Audit logs shipped off-cluster (read-only on cluster).
- Container logs (Loki / cloud equivalent).
- Metrics (Prometheus + kube-state-metrics + node-exporter).
- Traces (OTel Collector + Tempo / Jaeger / cloud).
- Continuous profiling (Parca / Pyroscope) optional but recommended.
- SLO tracking per service (Pyrra / Sloth).
A.9 Backup + DR¶
- Velero scheduled backups to off-cluster storage.
- Cross-region or cross-cluster restore tested at least quarterly.
- etcd snapshot tested for catastrophic-recovery scenario.
- DR runbook with RTO + RPO documented.
A.10 The cluster-baseline/ Template¶
cluster-baseline/
bootstrap/
pki/ # CA + per-component certs (cfssl)
etcd/ # systemd unit + config
kube-apiserver/
kube-scheduler/
kube-controller-manager/
kubelet/
cni/cilium-values.yaml
service-mesh/ # istio or linkerd values
observability/
prometheus/
grafana/
loki/
tempo/
parca/
policy/
pod-security/
networkpolicy-default-deny.yaml
gatekeeper-constraints/
kyverno-policies/
sigstore-policy.yaml
tenancy/
namespace-template/ # Crossplane composition
rbac-template/
quotas-template/
velero/
schedule.yaml
locations.yaml
runbooks/
node-not-ready.md
etcd-degraded.md
apiserver-oom.md
pod-pending-forever.md
cluster-rebuild.md
RUNBOOK.md
THREAT_MODEL.md
This is the artifact every cluster you bring up after week 24 should be provisioned from.