Horizontal Pod Autoscaler

The Horizontal Pod Autoscaler (HPA) automatically scales the number of pods in Kubernetes based on CPU, memory, or custom metrics.

What Is the Horizontal Pod Autoscaler?

The Horizontal Pod Autoscaler (HPA) is a Kubernetes controller that automatically adjusts the number of pods in a Deployment, ReplicaSet, or StatefulSet based on current load. When CPU utilization rises, new pods are started; when load decreases, excess pods are removed. This way, you only pay for the capacity you actually need.

How Does the HPA Work?

The HPA checks current metrics at regular intervals (every 15 seconds by default) and compares them against defined target values. Based on the ratio between actual and target values, it calculates the optimal pod count and scales accordingly.

Supported Metrics

CPU utilization (most commonly used)
Memory consumption
Custom metrics (e.g., requests per second, queue length)
External metrics (e.g., from Prometheus or Datadog)

Configuring the HPA

HPA configuration is done through Kubernetes manifests or kubectl commands. You define minimum and maximum pod counts along with target metrics. For CPU-based scaling, your pods need defined resource requests so the HPA can calculate relative utilization.

Scaling Strategies

Behavior API: Since Kubernetes 1.18, you can configure scale-up and scale-down policies separately
Stabilization window: Prevents constant scaling up and down (flapping)
Scale-down delay: 5 minutes by default to avoid premature downscaling

HPA and Cluster Autoscaler

The HPA scales pods within existing nodes. When no more node capacity is available, the Cluster Autoscaler steps in and adds new nodes. Together, they form a two-tier autoscaling system: HPA for the application layer, Cluster Autoscaler for the infrastructure layer.

Best Practices

Always set resource requests and limits for CPU and memory
Start with conservative scaling limits and optimize iteratively
Use Prometheus metrics for application-specific scaling
Combine HPA with Pod Disruption Budgets for high availability
Monitor scaling events through Kubernetes events and monitoring

Why devRocks?

We configure your HPA so that your applications reliably respond to load spikes and release resources during quiet periods. Our Kubernetes experts define the right metrics and thresholds for optimal performance at minimum cost.

Frequently asked questions about Horizontal Pod Autoscaler

HPA is already useful starting at two pods. The minimum number (minReplicas) should cover base load, while maxReplicas defines the upper limit for load spikes.

Yes, through the Custom Metrics API and External Metrics API, you can use any metrics, such as requests per second from Prometheus or queue lengths from RabbitMQ.

HPA scales horizontally by adding or removing pods. The Vertical Pod Autoscaler (VPA) scales vertically by adjusting the resource requests of individual pods.

Use the Behavior API to configure stabilization windows and scaling rates. A scale-down stabilization window of 5 minutes prevents premature downscaling.

Related terms

Kubernetes Observability Container Orchestration Spot Instances Prometheus & Grafana

Related services

Kubernetes

Your applications run fault-tolerant, auto-scale, and cost only what they consume — with production-ready Kubernetes clusters we build and operate.

Observability

Detect problems before users notice — with monitoring and alerting that covers your entire stack and lets you sleep at night.

FinOps & Cloud Costs

AWS cost analysis, rightsizing, Reserved Instances, and automated budget control.

Interested?

Let's talk about your project. We're happy to advise you with no obligation.

Last updated: April 2026

At a glance

Category: Infrastruktur
Related services: Kubernetes, Observability, FinOps & Cloud Costs