Horizontal Pod Autoscaler
The Horizontal Pod Autoscaler (HPA) automatically scales the number of pods in Kubernetes based on CPU, memory, or custom metrics.
What Is the Horizontal Pod Autoscaler?
The Horizontal Pod Autoscaler (HPA) is a Kubernetes controller that automatically adjusts the number of pods in a Deployment, ReplicaSet, or StatefulSet based on current load. When CPU utilization rises, new pods are started; when load decreases, excess pods are removed. This way, you only pay for the capacity you actually need.
How Does the HPA Work?
The HPA checks current metrics at regular intervals (every 15 seconds by default) and compares them against defined target values. Based on the ratio between actual and target values, it calculates the optimal pod count and scales accordingly.
Supported Metrics
- CPU utilization (most commonly used)
- Memory consumption
- Custom metrics (e.g., requests per second, queue length)
- External metrics (e.g., from Prometheus or Datadog)
Configuring the HPA
HPA configuration is done through Kubernetes manifests or kubectl commands. You define minimum and maximum pod counts along with target metrics. For CPU-based scaling, your pods need defined resource requests so the HPA can calculate relative utilization.
Scaling Strategies
- Behavior API: Since Kubernetes 1.18, you can configure scale-up and scale-down policies separately
- Stabilization window: Prevents constant scaling up and down (flapping)
- Scale-down delay: 5 minutes by default to avoid premature downscaling
HPA and Cluster Autoscaler
The HPA scales pods within existing nodes. When no more node capacity is available, the Cluster Autoscaler steps in and adds new nodes. Together, they form a two-tier autoscaling system: HPA for the application layer, Cluster Autoscaler for the infrastructure layer.
Best Practices
- Always set resource requests and limits for CPU and memory
- Start with conservative scaling limits and optimize iteratively
- Use Prometheus metrics for application-specific scaling
- Combine HPA with Pod Disruption Budgets for high availability
- Monitor scaling events through Kubernetes events and monitoring
Why devRocks?
We configure your HPA so that your applications reliably respond to load spikes and release resources during quiet periods. Our Kubernetes experts define the right metrics and thresholds for optimal performance at minimum cost.
Frequently asked questions about Horizontal Pod Autoscaler
HPA is already useful starting at two pods. The minimum number (minReplicas) should cover base load, while maxReplicas defines the upper limit for load spikes.
Yes, through the Custom Metrics API and External Metrics API, you can use any metrics, such as requests per second from Prometheus or queue lengths from RabbitMQ.
HPA scales horizontally by adding or removing pods. The Vertical Pod Autoscaler (VPA) scales vertically by adjusting the resource requests of individual pods.
Use the Behavior API to configure stabilization windows and scaling rates. A scale-down stabilization window of 5 minutes prevents premature downscaling.
Related services
Kubernetes
Container orchestration at scale — we design, operate, and manage production-ready Kubernetes clusters.
Observability
Full-stack monitoring and alerting that predicts outages before users are affected.
FinOps & Cloud Costs
AWS cost analysis, rightsizing, Reserved Instances, and automated budget control.
Last updated: April 2026