Appendix A-Kubernetes Hardening Reference¶

Cumulative hardening checklist. By week 24 the reader's cluster-baseline/ template should encode every section.

A.1 Control Plane¶

etcd: 3 or 5 nodes, mTLS, encryption-at-rest, snapshot+restore tested.
kube-apiserver: encryption providers, audit logging, NodeRestriction admission, PodSecurity admission, OIDC (or trustedSA) for users.
kube-scheduler: leader election; default + custom plugins reviewed.
kube-controller-manager: leader election; minimum SA permissions.
kubelet: read-only port disabled, TLS bootstrap with CSR approval, anonymous-auth false, authorization webhook.

A.2 RBAC¶

No bindings to the cluster-admin ClusterRole except for break-glass.
Per-tenant Roles, not ClusterRoles.
Audit system:authenticated and system:unauthenticated group bindings-both should be empty.
Use kubectl auth can-i --as=... to verify least-privilege per persona.

A.3 Pod Security¶

PodSecurity admission restricted everywhere by default.
Exceptions documented in code (namespace labels) with justification.
Pod-level: runAsNonRoot, readOnlyRootFilesystem, drop all caps, seccomp RuntimeDefault.
Mutating webhook to inject defaults if Pod spec omits them.

A.4 Network¶

CNI with NetworkPolicy support (Cilium, Calico).
Default-deny ingress + egress in every namespace.
Allowed flows declared per workload as labeled NetworkPolicy.
L7 policy on ingress (Cilium L7 NetworkPolicy or service mesh).
mTLS between Services (mesh).
Egress controls: explicit allowed CIDRs / FQDNs.

A.5 Image Supply Chain¶

Image admission (Kyverno / Cosign policy-controller) requires signature.
Allowlisted registries.
No latest tags; pin by digest in production.
SBOM and SLSA provenance attestations attached to every image.

A.6 Secrets¶

etcd encryption-at-rest with rotated keys.
External Secret Operator (ESO) for cloud-KMS-sourced secrets.
No secrets in env vars where possible (use volume mounts, watch for restart).
No secrets committed to git, even in sealed form, without sealed-secrets/sops ratchet.

A.7 Multi-Tenancy¶

One namespace per tenant; ResourceQuota + LimitRange.
Hierarchical Namespaces or Capsule for nested tenants.
PriorityClasses by tier; preemption tuned.
Per-tenant cost attribution via labels + OpenCost.

A.8 Observability¶

Audit logs shipped off-cluster (read-only on cluster).
Container logs (Loki / cloud equivalent).
Metrics (Prometheus + kube-state-metrics + node-exporter).
Traces (OTel Collector + Tempo / Jaeger / cloud).
Continuous profiling (Parca / Pyroscope) optional but recommended.
SLO tracking per service (Pyrra / Sloth).

A.9 Backup + DR¶

Velero scheduled backups to off-cluster storage.
Cross-region or cross-cluster restore tested at least quarterly.
etcd snapshot tested for catastrophic-recovery scenario.
DR runbook with RTO + RPO documented.

A.10 The `cluster-baseline/` Template¶

cluster-baseline/
  bootstrap/
    pki/                    # CA + per-component certs (cfssl)
    etcd/                   # systemd unit + config
    kube-apiserver/
    kube-scheduler/
    kube-controller-manager/
    kubelet/
  cni/cilium-values.yaml
  service-mesh/             # istio or linkerd values
  observability/
    prometheus/
    grafana/
    loki/
    tempo/
    parca/
  policy/
    pod-security/
    networkpolicy-default-deny.yaml
    gatekeeper-constraints/
    kyverno-policies/
    sigstore-policy.yaml
  tenancy/
    namespace-template/      # Crossplane composition
    rbac-template/
    quotas-template/
  velero/
    schedule.yaml
    locations.yaml
  runbooks/
    node-not-ready.md
    etcd-degraded.md
    apiserver-oom.md
    pod-pending-forever.md
    cluster-rebuild.md
  RUNBOOK.md
  THREAT_MODEL.md

This is the artifact every cluster you bring up after week 24 should be provisioned from.