Week 19 - HPA, VPA, KEDA: Autoscaling¶
19.1 Conceptual Core¶
- HPA (Horizontal Pod Autoscaler): scales replica count based on metrics. CPU/memory by default; with
metrics.k8s.io+custom.metrics.k8s.ioadapters (e.g.,prometheus-adapter), any metric is fair game. - VPA (Vertical Pod Autoscaler): adjusts a Pod's CPU/memory
requestsbased on observed usage. Two modes:Auto(recreate pod with new resources),Off/Initial(only on creation). - KEDA (Kubernetes Event-Driven Autoscaling): scale to zero, scale on event-source backlog (Kafka lag, SQS depth, custom). Sits in front of HPA.
19.2 Mechanical Detail¶
- HPA reconcile interval: 15s by default. Picking metrics that are too jittery causes flapping; smooth at the source.
- HPA scaling policies:
scaleUp.policiesandscaleDown.policieswith stabilization windows. Tune to workload's elasticity profile. - Custom metrics adapter (
prometheus-adapter): translates Prometheus queries into thecustom.metrics.k8s.ioAPI the HPA reads. Define rules in adapter config. - VPA's recommender computes percentile-based recommendations from historical usage. Often used in
Offmode just to suggest resource changes; production safety prefers manual approval.
19.3 Lab-"Autoscale on Custom Metrics"¶
- Deploy a load-test target with a Prometheus-exposed
requests_per_secondmetric. - Install
prometheus-adaptermapping that metric tocustom.metrics.k8s.io. - Author HPA targeting
AverageValue=200of that metric. Drive load; watch scaling. - Add KEDA in front for scale-to-zero behavior. Verify cold-start latency.
19.4 Hardening Drill¶
- Set
minReplicasto a non-zero value for any tier-1 service (avoid cold-start during incident traffic). CapmaxReplicasto avoid runaway autoscaling on metric anomalies.
19.5 Operations Slice¶
- Wire HPA event metrics. Alert on persistent
desiredReplicas == maxReplicas(you've hit the cap) and on flapping (scaleUpandscaleDownevents alternating rapidly).