Week 19 - HPA, VPA, KEDA: Autoscaling¶

19.1 Conceptual Core¶

HPA (Horizontal Pod Autoscaler): scales replica count based on metrics. CPU/memory by default; with metrics.k8s.io + custom.metrics.k8s.io adapters (e.g., prometheus-adapter), any metric is fair game.
VPA (Vertical Pod Autoscaler): adjusts a Pod's CPU/memory requests based on observed usage. Two modes: Auto (recreate pod with new resources), Off/Initial (only on creation).
KEDA (Kubernetes Event-Driven Autoscaling): scale to zero, scale on event-source backlog (Kafka lag, SQS depth, custom). Sits in front of HPA.

HPA reconcile interval: 15s by default. Picking metrics that are too jittery causes flapping; smooth at the source.
HPA scaling policies: scaleUp.policies and scaleDown.policies with stabilization windows. Tune to workload's elasticity profile.
Custom metrics adapter (prometheus-adapter): translates Prometheus queries into the custom.metrics.k8s.io API the HPA reads. Define rules in adapter config.
VPA's recommender computes percentile-based recommendations from historical usage. Often used in Off mode just to suggest resource changes; production safety prefers manual approval.

Deploy a load-test target with a Prometheus-exposed requests_per_second metric.
Install prometheus-adapter mapping that metric to custom.metrics.k8s.io.
Author HPA targeting AverageValue=200 of that metric. Drive load; watch scaling.
Add KEDA in front for scale-to-zero behavior. Verify cold-start latency.

Set minReplicas to a non-zero value for any tier-1 service (avoid cold-start during incident traffic). Cap maxReplicas to avoid runaway autoscaling on metric anomalies.

Wire HPA event metrics. Alert on persistent desiredReplicas == maxReplicas (you've hit the cap) and on flapping (scaleUp and scaleDown events alternating rapidly).