Saltar a contenido

Week 19 - HPA, VPA, KEDA: Autoscaling

19.1 Conceptual Core

  • HPA (Horizontal Pod Autoscaler): scales replica count based on metrics. CPU/memory by default; with metrics.k8s.io + custom.metrics.k8s.io adapters (e.g., prometheus-adapter), any metric is fair game.
  • VPA (Vertical Pod Autoscaler): adjusts a Pod's CPU/memory requests based on observed usage. Two modes: Auto (recreate pod with new resources), Off/Initial (only on creation).
  • KEDA (Kubernetes Event-Driven Autoscaling): scale to zero, scale on event-source backlog (Kafka lag, SQS depth, custom). Sits in front of HPA.

19.2 Mechanical Detail

  • HPA reconcile interval: 15s by default. Picking metrics that are too jittery causes flapping; smooth at the source.
  • HPA scaling policies: scaleUp.policies and scaleDown.policies with stabilization windows. Tune to workload's elasticity profile.
  • Custom metrics adapter (prometheus-adapter): translates Prometheus queries into the custom.metrics.k8s.io API the HPA reads. Define rules in adapter config.
  • VPA's recommender computes percentile-based recommendations from historical usage. Often used in Off mode just to suggest resource changes; production safety prefers manual approval.

19.3 Lab-"Autoscale on Custom Metrics"

  1. Deploy a load-test target with a Prometheus-exposed requests_per_second metric.
  2. Install prometheus-adapter mapping that metric to custom.metrics.k8s.io.
  3. Author HPA targeting AverageValue=200 of that metric. Drive load; watch scaling.
  4. Add KEDA in front for scale-to-zero behavior. Verify cold-start latency.

19.4 Hardening Drill

  • Set minReplicas to a non-zero value for any tier-1 service (avoid cold-start during incident traffic). Cap maxReplicas to avoid runaway autoscaling on metric anomalies.

19.5 Operations Slice

  • Wire HPA event metrics. Alert on persistent desiredReplicas == maxReplicas (you've hit the cap) and on flapping (scaleUp and scaleDown events alternating rapidly).

Comments