Skip to Content

Horizontal Pod Autoscaler

The Horizontal Pod Autoscaler (HPA) automatically scales the number of pods in Kubernetes based on CPU, memory, or custom metrics.

What Is the Horizontal Pod Autoscaler?

The Horizontal Pod Autoscaler (HPA) is a Kubernetes controller that automatically adjusts the number of pods in a Deployment, ReplicaSet, or StatefulSet based on current load. When CPU utilization rises, new pods are started; when load decreases, excess pods are removed. This way, you only pay for the capacity you actually need.

How Does the HPA Work?

The HPA checks current metrics at regular intervals (every 15 seconds by default) and compares them against defined target values. Based on the ratio between actual and target values, it calculates the optimal pod count and scales accordingly.

Supported Metrics

  • CPU utilization (most commonly used)
  • Memory consumption
  • Custom metrics (e.g., requests per second, queue length)
  • External metrics (e.g., from Prometheus or Datadog)

Configuring the HPA

HPA configuration is done through Kubernetes manifests or kubectl commands. You define minimum and maximum pod counts along with target metrics. For CPU-based scaling, your pods need defined resource requests so the HPA can calculate relative utilization.

Scaling Strategies

  • Behavior API: Since Kubernetes 1.18, you can configure scale-up and scale-down policies separately
  • Stabilization window: Prevents constant scaling up and down (flapping)
  • Scale-down delay: 5 minutes by default to avoid premature downscaling

HPA and Cluster Autoscaler

The HPA scales pods within existing nodes. When no more node capacity is available, the Cluster Autoscaler steps in and adds new nodes. Together, they form a two-tier autoscaling system: HPA for the application layer, Cluster Autoscaler for the infrastructure layer.

Best Practices

  • Always set resource requests and limits for CPU and memory
  • Start with conservative scaling limits and optimize iteratively
  • Use Prometheus metrics for application-specific scaling
  • Combine HPA with Pod Disruption Budgets for high availability
  • Monitor scaling events through Kubernetes events and monitoring

Why devRocks?

We configure your HPA so that your applications reliably respond to load spikes and release resources during quiet periods. Our Kubernetes experts define the right metrics and thresholds for optimal performance at minimum cost.

Frequently asked questions about Horizontal Pod Autoscaler

HPA is already useful starting at two pods. The minimum number (minReplicas) should cover base load, while maxReplicas defines the upper limit for load spikes.

Yes, through the Custom Metrics API and External Metrics API, you can use any metrics, such as requests per second from Prometheus or queue lengths from RabbitMQ.

HPA scales horizontally by adding or removing pods. The Vertical Pod Autoscaler (VPA) scales vertically by adjusting the resource requests of individual pods.

Use the Behavior API to configure stabilization windows and scaling rates. A scale-down stabilization window of 5 minutes prevents premature downscaling.

Interested?

Let's talk about your project. We're happy to advise you with no obligation.

Contact us

Last updated: April 2026