Saltar a contenido

Appendix A-Kubernetes Hardening Reference

Cumulative hardening checklist. By week 24 the reader's cluster-baseline/ template should encode every section.


A.1 Control Plane

  • etcd: 3 or 5 nodes, mTLS, encryption-at-rest, snapshot+restore tested.
  • kube-apiserver: encryption providers, audit logging, NodeRestriction admission, PodSecurity admission, OIDC (or trustedSA) for users.
  • kube-scheduler: leader election; default + custom plugins reviewed.
  • kube-controller-manager: leader election; minimum SA permissions.
  • kubelet: read-only port disabled, TLS bootstrap with CSR approval, anonymous-auth false, authorization webhook.

A.2 RBAC

  • No bindings to the cluster-admin ClusterRole except for break-glass.
  • Per-tenant Roles, not ClusterRoles.
  • Audit system:authenticated and system:unauthenticated group bindings-both should be empty.
  • Use kubectl auth can-i --as=... to verify least-privilege per persona.

A.3 Pod Security

  • PodSecurity admission restricted everywhere by default.
  • Exceptions documented in code (namespace labels) with justification.
  • Pod-level: runAsNonRoot, readOnlyRootFilesystem, drop all caps, seccomp RuntimeDefault.
  • Mutating webhook to inject defaults if Pod spec omits them.

A.4 Network

  • CNI with NetworkPolicy support (Cilium, Calico).
  • Default-deny ingress + egress in every namespace.
  • Allowed flows declared per workload as labeled NetworkPolicy.
  • L7 policy on ingress (Cilium L7 NetworkPolicy or service mesh).
  • mTLS between Services (mesh).
  • Egress controls: explicit allowed CIDRs / FQDNs.

A.5 Image Supply Chain

  • Image admission (Kyverno / Cosign policy-controller) requires signature.
  • Allowlisted registries.
  • No latest tags; pin by digest in production.
  • SBOM and SLSA provenance attestations attached to every image.

A.6 Secrets

  • etcd encryption-at-rest with rotated keys.
  • External Secret Operator (ESO) for cloud-KMS-sourced secrets.
  • No secrets in env vars where possible (use volume mounts, watch for restart).
  • No secrets committed to git, even in sealed form, without sealed-secrets/sops ratchet.

A.7 Multi-Tenancy

  • One namespace per tenant; ResourceQuota + LimitRange.
  • Hierarchical Namespaces or Capsule for nested tenants.
  • PriorityClasses by tier; preemption tuned.
  • Per-tenant cost attribution via labels + OpenCost.

A.8 Observability

  • Audit logs shipped off-cluster (read-only on cluster).
  • Container logs (Loki / cloud equivalent).
  • Metrics (Prometheus + kube-state-metrics + node-exporter).
  • Traces (OTel Collector + Tempo / Jaeger / cloud).
  • Continuous profiling (Parca / Pyroscope) optional but recommended.
  • SLO tracking per service (Pyrra / Sloth).

A.9 Backup + DR

  • Velero scheduled backups to off-cluster storage.
  • Cross-region or cross-cluster restore tested at least quarterly.
  • etcd snapshot tested for catastrophic-recovery scenario.
  • DR runbook with RTO + RPO documented.

A.10 The cluster-baseline/ Template

cluster-baseline/
  bootstrap/
    pki/                    # CA + per-component certs (cfssl)
    etcd/                   # systemd unit + config
    kube-apiserver/
    kube-scheduler/
    kube-controller-manager/
    kubelet/
  cni/cilium-values.yaml
  service-mesh/             # istio or linkerd values
  observability/
    prometheus/
    grafana/
    loki/
    tempo/
    parca/
  policy/
    pod-security/
    networkpolicy-default-deny.yaml
    gatekeeper-constraints/
    kyverno-policies/
    sigstore-policy.yaml
  tenancy/
    namespace-template/      # Crossplane composition
    rbac-template/
    quotas-template/
  velero/
    schedule.yaml
    locations.yaml
  runbooks/
    node-not-ready.md
    etcd-degraded.md
    apiserver-oom.md
    pod-pending-forever.md
    cluster-rebuild.md
  RUNBOOK.md
  THREAT_MODEL.md

This is the artifact every cluster you bring up after week 24 should be provisioned from.

Comments