Autoscaling OKE Workloads

Published Aug 28, 2025 · 9 min readBy Fredrik FilipssonIndependent OCI services

Autoscaling is what lets an OKE cluster absorb a traffic spike without falling over and shrink back down when the spike passes so you are not paying for idle capacity. It sounds simple, but it involves two separate mechanisms working on two different axes, and teams that configure one without the other end up with clusters that either cannot scale or never scale back. This article explains how scaling works on OKE, how the pieces fit together, and how to tune them so the cluster behaves.

It is part of our OKE and containers series and assumes the cluster design from OKE cluster architecture.

Two axes of scaling

Scaling an application on Kubernetes has two dimensions. The first is the number of pod replicas, which is about giving the application more copies of itself to handle load. The second is the number of nodes, which is about having somewhere for those pods to run. Both matter, and they are controlled by different components. Confusing the two is the root of most scaling problems.

Component	Scales	Trigger	Result
Horizontal Pod Autoscaler	Pod replicas	CPU, memory or custom metrics	More or fewer copies of the application
Cluster Autoscaler	Nodes in a pool	Pods that cannot be scheduled	More or fewer worker nodes
Virtual nodes	Capacity per pod	Pod scheduling	No node management at all

The Horizontal Pod Autoscaler

The Horizontal Pod Autoscaler, or HPA, watches a metric for your application, most commonly CPU or memory usage, and adds or removes pod replicas to keep that metric near a target. If you set a target of seventy percent CPU and load pushes the pods past it, the HPA adds replicas until the average drops back. When load falls, it removes replicas. The HPA only works if your pods declare resource requests, because it needs a baseline to measure against. This is the most common reason an HPA appears to do nothing: the pods have no resource requests set, so there is no signal to scale on.

The Horizontal Pod Autoscaler is only as good as your resource requests. Without them, it has nothing to measure and nothing to scale on.

The Cluster Autoscaler

The HPA can ask for more pods, but those pods need nodes to run on. The Cluster Autoscaler watches for pods that are pending because no node has room for them, and it adds nodes to the relevant managed node pool to make space. When nodes sit underused, it drains and removes them to save money. The Cluster Autoscaler is what closes the loop: the HPA scales pods, the Cluster Autoscaler scales the nodes those pods need. Running the HPA without the Cluster Autoscaler means new pods stay pending once the existing nodes fill up, which looks like a scaling failure but is really a missing second half.

Tuning the two together

The two autoscalers must be configured to cooperate. The Cluster Autoscaler needs minimum and maximum node counts per pool that give it room to grow without runaway cost. The HPA needs sensible targets and minimum and maximum replica counts. And the timing matters: scaling up should be quick enough to absorb a spike, while scaling down should be patient enough not to thrash, removing capacity the moment a brief lull appears only to add it back seconds later. Getting the scale down behaviour calm is where most of the cost savings and the stability come from.

Virtual nodes: scaling without node management

Virtual nodes change the picture entirely. With virtual nodes, there is no node axis to scale because OCI provides capacity per pod. You scale pods with the HPA and the capacity simply appears, billed for the pods you run rather than for nodes you provision. This removes the Cluster Autoscaler from the equation and removes node patching and sizing too. Virtual nodes are particularly good for bursty and unpredictable workloads, where managing a node pool to match swings in demand is awkward. The trade offs against managed nodes are covered in OKE virtual nodes explained.

A scaling configuration framework

Set resource requests on every pod. Without them the HPA cannot work and the scheduler cannot place pods well.
Configure the HPA with a sensible target metric and minimum and maximum replicas for each scalable workload.
Enable the Cluster Autoscaler on managed node pools with minimum and maximum node counts.
Tune scale down to be patient so the cluster does not thrash capacity on brief lulls.
Use virtual nodes for bursty or unpredictable workloads to avoid managing node capacity for swings.
Test under realistic load to confirm the two axes cooperate before you depend on them.

Common scaling traps

A handful of mistakes recur. The first is no resource requests, which silently disables the HPA. The second is the HPA without the Cluster Autoscaler, so pods scale up and then sit pending. The third is aggressive scale down that thrashes, adding and removing capacity in a loop. The fourth is node pools that never scale back down because of pods that cannot be evicted, leaving expensive capacity running idle, which shows up directly in the bill and is a frequent target in OKE cost optimization. The fifth is forgetting that scaling has limits in the underlying tenancy, such as service limits and pod IP addresses, which cap how far the cluster can grow regardless of autoscaler settings.

Scaling and cost

Autoscaling is as much a cost discipline as a performance one. A cluster that scales up for demand and reliably back down afterward pays only for what it uses. A cluster that scales up and never down pays for peak capacity all the time. Because worker capacity dominates OKE cost, getting scale down to work correctly is one of the highest leverage things you can do for the bill, which is why it features heavily in our cost work.

Bringing it together

Autoscaling on OKE is two mechanisms on two axes, plus the virtual node option that removes the node axis altogether. Configure the HPA to scale pods, the Cluster Autoscaler to scale the nodes those pods need, tune them to cooperate, and use virtual nodes where bursty demand makes node management awkward. Get that right and the cluster handles demand calmly and costs only what it should. Continue with OKE virtual nodes explained and OKE cost optimization.

The OCI cost optimization practice tunes OKE scaling as part of right sizing an estate, on a fee paid only on verified savings.

Free white paper

Go deeper on this topic with The OCI Landing Zone and Architecture Guide, a reference architecture for security, networking, and governance on OCI. An independent analyst style report with comparison tables and recommendations, free with a work email. Prefer a monthly summary instead? The OCI Brief delivers one practical OCI briefing a month.

Part of a series
This guide is part of Kubernetes & DevOps on OCI — our complete pillar guide on the topic.

About the author

Fredrik Filipsson, Co-founder of OCI Specialists — 20 years of enterprise IT experience in Oracle Database, OCI cost optimization, licensing, and data platforms. Full profile · LinkedIn

Moving Oracle workloads to OCI, or already running on OCI and not sure the architecture or the spend is right? Most teams bring in a specialist before they commit to a region, a shape, or a Universal Credits number. OCISpecialists.com plans the landing zone, runs the migration, and manages the estate after go live, on a fixed project fee, a managed monthly retainer, or a cost optimization fee paid only on verified savings.