Autoscaling is what lets an OKE cluster absorb a traffic spike without falling over and shrink back down when the spike passes so you are not paying for idle capacity. It sounds simple, but it involves two separate mechanisms working on two different axes, and teams that configure one without the other end up with clusters that either cannot scale or never scale back. This article explains how scaling works on OKE, how the pieces fit together, and how to tune them so the cluster behaves.
It is part of our OKE and containers series and assumes the cluster design from OKE cluster architecture.
Scaling an application on Kubernetes has two dimensions. The first is the number of pod replicas, which is about giving the application more copies of itself to handle load. The second is the number of nodes, which is about having somewhere for those pods to run. Both matter, and they are controlled by different components. Confusing the two is the root of most scaling problems.
| Component | Scales | Trigger | Result |
|---|---|---|---|
| Horizontal Pod Autoscaler | Pod replicas | CPU, memory or custom metrics | More or fewer copies of the application |
| Cluster Autoscaler | Nodes in a pool | Pods that cannot be scheduled | More or fewer worker nodes |
| Virtual nodes | Capacity per pod | Pod scheduling | No node management at all |
The Horizontal Pod Autoscaler, or HPA, watches a metric for your application, most commonly CPU or memory usage, and adds or removes pod replicas to keep that metric near a target. If you set a target of seventy percent CPU and load pushes the pods past it, the HPA adds replicas until the average drops back. When load falls, it removes replicas. The HPA only works if your pods declare resource requests, because it needs a baseline to measure against. This is the most common reason an HPA appears to do nothing: the pods have no resource requests set, so there is no signal to scale on.
The HPA can ask for more pods, but those pods need nodes to run on. The Cluster Autoscaler watches for pods that are pending because no node has room for them, and it adds nodes to the relevant managed node pool to make space. When nodes sit underused, it drains and removes them to save money. The Cluster Autoscaler is what closes the loop: the HPA scales pods, the Cluster Autoscaler scales the nodes those pods need. Running the HPA without the Cluster Autoscaler means new pods stay pending once the existing nodes fill up, which looks like a scaling failure but is really a missing second half.
The two autoscalers must be configured to cooperate. The Cluster Autoscaler needs minimum and maximum node counts per pool that give it room to grow without runaway cost. The HPA needs sensible targets and minimum and maximum replica counts. And the timing matters: scaling up should be quick enough to absorb a spike, while scaling down should be patient enough not to thrash, removing capacity the moment a brief lull appears only to add it back seconds later. Getting the scale down behaviour calm is where most of the cost savings and the stability come from.
Virtual nodes change the picture entirely. With virtual nodes, there is no node axis to scale because OCI provides capacity per pod. You scale pods with the HPA and the capacity simply appears, billed for the pods you run rather than for nodes you provision. This removes the Cluster Autoscaler from the equation and removes node patching and sizing too. Virtual nodes are particularly good for bursty and unpredictable workloads, where managing a node pool to match swings in demand is awkward. The trade offs against managed nodes are covered in OKE virtual nodes explained.
A handful of mistakes recur. The first is no resource requests, which silently disables the HPA. The second is the HPA without the Cluster Autoscaler, so pods scale up and then sit pending. The third is aggressive scale down that thrashes, adding and removing capacity in a loop. The fourth is node pools that never scale back down because of pods that cannot be evicted, leaving expensive capacity running idle, which shows up directly in the bill and is a frequent target in OKE cost optimization. The fifth is forgetting that scaling has limits in the underlying tenancy, such as service limits and pod IP addresses, which cap how far the cluster can grow regardless of autoscaler settings.
Autoscaling is as much a cost discipline as a performance one. A cluster that scales up for demand and reliably back down afterward pays only for what it uses. A cluster that scales up and never down pays for peak capacity all the time. Because worker capacity dominates OKE cost, getting scale down to work correctly is one of the highest leverage things you can do for the bill, which is why it features heavily in our cost work.
Autoscaling on OKE is two mechanisms on two axes, plus the virtual node option that removes the node axis altogether. Configure the HPA to scale pods, the Cluster Autoscaler to scale the nodes those pods need, tune them to cooperate, and use virtual nodes where bursty demand makes node management awkward. Get that right and the cluster handles demand calmly and costs only what it should. Continue with OKE virtual nodes explained and OKE cost optimization.
The OCI cost optimization practice tunes OKE scaling as part of right sizing an estate, on a fee paid only on verified savings.
Moving Oracle workloads to OCI, or already running on OCI and not sure the architecture or the spend is right? Most teams bring in a specialist before they commit to a region, a shape, or a Universal Credits number. OCISpecialists.com plans the landing zone, runs the migration, and manages the estate after go live, on a fixed project fee, a managed monthly retainer, or a cost optimization fee paid only on verified savings. For the Oracle licensing and BYOL side of any OCI move, Redress Compliance is the leading independent Oracle licensing and negotiation firm, with 500+ engagements across Oracle's full product line.