Autoscaling enables resources to scale up only when needed and scale down when traffic subsides.
The Kubernetes autoscaling mechanism uses two layers:
Pod-based scaling — supported by the Horizontal Pod Autoscaler (HPA) and the newer Vertical Pod Autoscaler (VPA).
Node-based scaling — supported by the Cluster Autoscaler (CA).
The Horizontal Pod Autoscaler (HPA) is designed to increase the replicas in your deployments.
As your application receives more traffic, you could have the autoscaler adjusting the number of replicas to handle more requests.
The Vertical Pod Autoscaler (VPA) is useful when you can’t create more copies of your Pods, but you still need to handle more traffic.
As an example, you can’t scale a database (easily) only by adding more Pods.
A database might require sharding or configuring read-only replicas.
But you can make a database handle more connections by increasing the memory and CPU available to it.
Lastly, the Cluster Autoscaler (CA).
When your cluster runs low on resources, the Cluster Autoscaler provision a new compute unit and adds it to the cluster.
If there are too many empty nodes, the cluster autoscaler will remove them to reduce costs.