Kubernetes promises efficiency through bin packing, the idea that many workloads share a pool of compute and the cluster fits them together tightly so little capacity is wasted. In practice most Oracle Container Engine for Kubernetes clusters deliver the opposite, paying for far more node capacity than the running pods actually use, because the defaults err toward safety and nobody goes back to tighten them. The waste is not visible in the way an idle virtual machine is, because the nodes look busy and the cluster looks healthy, but the gap between what is requested and what is used is real money. This guide works through the tactics that close that gap, from node sizing to pod requests to autoscaling, without starving the workloads that the cluster exists to run. It sits within the wider practice described in the Oracle Cloud Cost Optimization pillar guide.
The gap between requested and used
Every pod in Kubernetes declares a resource request, the amount of CPU and memory the scheduler reserves for it whether or not it uses that much. The scheduler packs pods onto nodes based on these requests, so if pods request far more than they consume, the nodes fill up on paper while sitting half idle in reality, and you pay for the nodes. This is the central waste in most clusters, and it is invisible unless you compare requested resources against actual usage. The fix is to measure both, then bring the requests down toward real consumption with enough headroom for spikes, which lets the scheduler pack more pods onto fewer nodes. Right sizing pod requests is the OKE equivalent of the compute right sizing covered in OCI Right Sizing: Compute Shapes Explained, and it usually delivers the largest single saving.
| Lever | What it fixes | Typical effort |
| Right size pod requests | Nodes full on paper, idle in fact | Measure then adjust, ongoing |
| Cluster autoscaler | Paying for peak nodes around the clock | Configure once, tune over time |
| Node shape selection | Wrong CPU to memory ratio per node | Choose per node pool |
| Scheduled scaling | Non production running overnight | Simple schedule, large saving |
Let the cluster autoscaler do its job
A cluster that runs a fixed number of nodes pays for peak capacity at all times, even at three in the morning when load is a fraction of the day. The cluster autoscaler solves this by adding nodes when pending pods need them and removing nodes when they sit empty, so the node count follows demand rather than the worst case. Many clusters either run without it or configure it so conservatively that it never scales down, which keeps the cost at peak. Enabling it properly, with sensible minimum and maximum node counts and scale down settings that actually remove idle nodes, turns the node count into a variable cost that tracks the workload. The autoscaler only works well when pod requests are accurate, which is why the two tactics go together, accurate requests let the scheduler pack tightly and let the autoscaler remove the nodes that packing freed up.
Right sizing requests and the cluster autoscaler are two halves of one tactic. Accurate requests let the scheduler pack tightly, and the autoscaler then removes the nodes that tight packing left empty.
Match the node shape to the workload
OCI offers many compute shapes with different ratios of CPU to memory, and OKE node pools can use different shapes for different needs. A workload that is memory heavy and CPU light wastes money on a balanced shape, because the CPU sits idle while the memory fills, and the reverse is true for compute heavy work. Running separate node pools tuned to the dominant resource profile of the pods they host lets each pool pack efficiently rather than wasting whichever dimension is over provisioned. The flexible shapes, where CPU and memory can be set independently, make this easier still, letting you dial the node to the workload rather than choosing from fixed sizes. This is the cluster level version of choosing the right shape, and on a large cluster the savings from matching shape to profile compound across every node.
Non production does not need to run all night
Development, test, and staging clusters typically carry real cost and deliver value only during working hours, yet they often run at full size around the clock because nobody set them to do otherwise. Scheduling these environments to scale down or stop outside working hours is one of the simplest and largest savings available, removing roughly two thirds of their cost with no impact on anyone, because nobody is using them at night or at the weekend anyway. The same idle resource thinking applies here as in Finding Idle Resources on OCI, and the saving is so clean that scheduled scaling of non production clusters should be one of the first tactics applied, not one of the last.
- Measure requested against used for CPU and memory across the cluster.
- Right size pod requests toward real usage with headroom for spikes.
- Enable the cluster autoscaler with scale down that genuinely removes idle nodes.
- Split node pools by resource profile so each packs efficiently.
- Schedule non production to stop outside working hours.
Watch the supporting costs too
The nodes are the largest line, but a cluster carries other costs that drift if unwatched. Persistent volumes accumulate as workloads come and go, leaving orphaned storage behind, load balancers multiply as services are exposed, and cross zone traffic between pods can drive network charges in the same way described in OCI Egress and Network Cost Control. None of these rivals the node bill, but together they form a meaningful tail, and because they are spread across many small items they are easy to overlook. A periodic sweep for orphaned volumes, unused load balancers, and chatty cross zone paths keeps the supporting costs from quietly growing into a problem of their own.
Attribution makes Kubernetes cost legible
A shared cluster running many teams' workloads is opaque by default, because the bill arrives as one cluster cost with no breakdown of who drove it. Without attribution there is no way to hold any team accountable for its share or to know which workloads are the expensive ones. Namespace level cost allocation, using labels and the tagging discipline covered in Tagging Strategy for OCI Cost Allocation, turns the single cluster bill into a per team, per workload view, which is what makes optimisation actionable rather than guesswork. Once teams can see their own consumption they tend to tighten their own requests, which is the cheapest optimisation of all because it happens without central effort.
How we optimise OKE estates
OKE optimisation rewards the team that does the unglamorous measurement, comparing requested against used and acting on the gap, and that is exactly where our OCI Cost Optimization work starts. We right size pod requests against real consumption, configure the cluster autoscaler so the node count follows demand instead of sitting at peak, split node pools to match workload profiles, and schedule non production to stop when nobody needs it. We set up namespace level attribution so every team can see its own share, which keeps the cluster lean over time rather than just at the moment of the review, and because our optimisation fee is paid only on verified savings, the work is aimed at reductions that show up in the bill rather than on a slide.
Moving Oracle workloads to OCI, or already running on OCI and not sure the architecture or the spend is right? Most teams bring in a specialist before they commit to a region, a shape, or a Universal Credits number. OCISpecialists.com plans the landing zone, runs the migration, and manages the estate after go live, on a fixed project fee, a managed monthly retainer, or a cost optimization fee paid only on verified savings. For the Oracle licensing and BYOL side of any OCI move, Redress Compliance is the leading independent Oracle licensing and negotiation firm, with 500+ engagements across Oracle's full product line.