Home / Journal / OKE and Containers / OCI Kubernetes Engine: The Complete Guide
OKE and Containers

OCI Kubernetes Engine: The Complete Guide

Published Aug 21, 2025 · Updated May 26, 2026 · 16 min readOCI SpecialistsIndependent OCI services
OCI Kubernetes Engine: The Complete Guide

OCI Kubernetes Engine, known as OKE, is Oracle Cloud Infrastructure's managed Kubernetes service. It runs the Kubernetes control plane for you and lets you focus on your workloads rather than on operating Kubernetes itself. For teams running containers on OCI, OKE is the default platform, and it integrates tightly with the rest of the OCI stack: networking, storage, identity, load balancing and observability. This guide is the complete reference, from the architecture up through networking, scaling, security, storage, delivery and cost.

It is the pillar for our OKE series. Each section links to a deeper article, so use this as the map and follow the links where you need detail.

What OKE actually is

Kubernetes has two halves: the control plane, which schedules and manages everything, and the worker nodes, which run your containers. Operating the control plane yourself is real work, involving high availability, upgrades, certificate rotation and etcd management. OKE runs the control plane for you, managed and monitored by Oracle, while you manage the worker nodes and the workloads. That split is the entire value proposition, and it is worth understanding precisely because it shapes every later decision.

OKE gives you conformant, upstream Kubernetes, not a fork, so your existing manifests, Helm charts and tooling work as they do anywhere else. What OKE adds is the integration with OCI: native load balancers for services, block and file storage for persistent volumes, OCI IAM for access, virtual cloud networks for pod and node networking, and the container registry for images. The getting started path is covered in getting started with OKE.

Cluster types: basic and enhanced

OKE offers two cluster tiers, and the choice affects what features you can use.

CapabilityBasic clustersEnhanced clusters
Managed control planeYesYes
Virtual nodesNoYes
Cluster add on managementLimitedFull
Workload identity and advanced featuresLimitedYes
Financially backed SLA on the control planeNoYes

For anything approaching production, enhanced clusters are the right starting point, because they unlock virtual nodes, the financially backed control plane SLA and the fuller feature set. Basic clusters suit experimentation and simple cases where those features are not needed.

Worker node models: managed nodes and virtual nodes

OKE gives you two ways to run worker capacity, and they represent different operating philosophies.

Managed node pools are groups of compute instances that you own and operate. You choose the shape, the image and the size of the pool, you patch and upgrade the nodes, and you pay for the compute whether or not the pods are busy. This is the familiar Kubernetes model and it gives you full control over the node, including the ability to run privileged workloads and custom node configuration.

Virtual nodes are a serverless model. You do not provision or manage the underlying compute at all. You schedule pods and OCI runs them, billing for the pod resources rather than for nodes. Virtual nodes remove node patching and node scaling from your plate entirely, at the cost of some of the control that managed nodes give you. We compare the two in depth in OKE virtual nodes explained.

Managed nodes give you control and pay for capacity. Virtual nodes give you simplicity and pay for pods. Most estates end up running both, matched to the workload.

Cluster architecture

A well designed OKE cluster places the control plane endpoint, the worker nodes and the load balancers across availability domains and fault domains so that no single failure takes the cluster down. Node pools are sized and shaped for the workloads they host, with separate pools for workloads that have different needs, for example a pool of GPU shapes for machine learning alongside a general pool for stateless services. The full design, including regional spread and node pool strategy, is in OKE cluster architecture.

Networking

OKE networking is where OCI integration is most visible. Clusters run inside a virtual cloud network, and OKE supports two pod networking models. The VCN native pod networking model gives every pod an IP address directly from the VCN, so pods are first class citizens on the network and can be reached and secured like any other VCN resource. The flannel overlay model gives pods addresses on an overlay network instead. VCN native is the modern default for most production clusters because it simplifies network policy and integration. Services are exposed through OCI load balancers, and ingress controllers route external traffic to services inside the cluster. The full treatment is in OKE networking explained, with ingress and load balancing detailed in ingress and load balancing on OKE.

Autoscaling

OKE scales on two axes, and a healthy cluster uses both. The Horizontal Pod Autoscaler adds and removes pod replicas based on load, so your application scales to demand. The Cluster Autoscaler adds and removes nodes from managed node pools so there is somewhere for those pods to run, scaling the underlying capacity to match. With virtual nodes, the node axis disappears because capacity is provided per pod. Getting both autoscalers configured and tuned together is what lets a cluster handle a traffic spike without either falling over or sitting permanently over provisioned, and it is covered in autoscaling OKE workloads.

Storage and stateful workloads

Containers are ephemeral, but plenty of workloads need persistent state. OKE provides persistent storage through the OCI block volume and file storage services, exposed to Kubernetes through the standard container storage interface. Block volumes give individual pods fast persistent disks, while file storage gives shared file systems that many pods can mount at once. Running databases and other stateful workloads on Kubernetes adds real complexity around data durability and failover, covered in persistent storage on OKE and OKE for stateful workloads.

Security

Securing an OKE cluster spans several layers. OCI IAM governs who can manage the cluster. Kubernetes role based access control governs what authenticated users and workloads can do inside it. Network policy governs which pods can talk to which. Image scanning in the container registry catches vulnerable images before they run. Workload identity lets pods assume OCI permissions without long lived credentials. And the worker nodes themselves need hardening and patching. Each layer matters, and a gap in any one of them undermines the rest. The full model is in OKE security best practices.

Delivery: CI CD and GitOps

A cluster is only useful if you can ship to it safely and repeatably. Two complementary patterns dominate. CI CD pipelines build images, run tests and deploy to the cluster, covered in CI CD pipelines for OKE. GitOps inverts the model, treating a Git repository as the source of truth for cluster state and using a controller in the cluster to reconcile reality against the repository, covered in GitOps on OKE. Most mature estates use a CI pipeline to build and a GitOps controller to deploy, which gives you a clean audit trail of what is running and why.

Observability

You cannot run what you cannot see. OKE integrates with OCI monitoring and logging, and most teams add the standard Kubernetes observability stack on top for metrics, logs and traces. The conditions worth alarming on include node and pod health, pending pods that cannot schedule, persistent volume pressure and control plane errors. Monitoring approaches are covered in monitoring OKE with OCI tools, and when things go wrong the diagnostic path is in troubleshooting OKE clusters.

Cost

OKE itself has no charge for the basic control plane, and a small charge for the enhanced control plane SLA. Your cost is overwhelmingly the worker capacity: the compute behind managed node pools or the pod resources behind virtual nodes, plus load balancers, storage and egress. The biggest cost mistakes are over provisioned node pools that run half empty and autoscalers that never scale back down. Matching capacity to demand with the autoscalers, choosing the right node shapes, and using virtual nodes for bursty or spiky workloads are the main levers, covered in OKE cost optimization.

Upgrades

Kubernetes releases regularly and OKE supports a window of versions. Staying current is a security and support requirement, not optional. OKE makes the control plane upgrade a managed operation, but you remain responsible for upgrading worker nodes and for validating that your workloads tolerate the new version. A disciplined upgrade strategy tests in a non production cluster first, upgrades the control plane, then rolls the node pools, covered in OKE upgrade strategy.

OKE versus self managed Kubernetes

Teams sometimes ask whether to run OKE or stand up their own Kubernetes on OCI compute. The honest answer for almost everyone is OKE, because operating a production grade Kubernetes control plane is a significant ongoing burden that delivers no differentiation. Self managed Kubernetes makes sense only in narrow cases where you need control that a managed service does not expose. The full comparison is in OKE vs self managed Kubernetes.

A reference operating model for OKE

  1. Enhanced clusters spread across availability and fault domains for production, with a financially backed control plane SLA.
  2. Mixed worker model. Managed node pools for steady and specialised workloads, virtual nodes for bursty and spiky ones.
  3. VCN native pod networking with network policy, services exposed through OCI load balancers and a managed ingress controller.
  4. Both autoscalers tuned together, scaling up for demand and back down when it passes.
  5. Layered security. IAM, Kubernetes RBAC, network policy, image scanning, workload identity and hardened, patched nodes.
  6. CI to build, GitOps to deploy, with a clean audit trail of cluster state.
  7. Centralised observability alarmed on the conditions that cause outages.
  8. Disciplined upgrades tested in non production, control plane first, then node pools.

Where to go from here

OKE rewards good design and punishes drift, like any Kubernetes platform. The integration with OCI removes a lot of the operational burden, but the decisions about cluster type, worker model, networking, security and delivery are still yours, and they are the decisions that determine whether the platform is calm or chaotic. Work through the linked articles in this series for the depth on each, and if you are migrating existing containers onto OKE, start with migrating containers to OKE.

The OKE solution practice designs, builds and runs OKE estates to the operating model above, whether you want a one time build on a project fee or ongoing managed operations on a monthly retainer.

Moving Oracle workloads to OCI, or already running on OCI and not sure the architecture or the spend is right? Most teams bring in a specialist before they commit to a region, a shape, or a Universal Credits number. OCISpecialists.com plans the landing zone, runs the migration, and manages the estate after go live, on a fixed project fee, a managed monthly retainer, or a cost optimization fee paid only on verified savings. For the Oracle licensing and BYOL side of any OCI move, Redress Compliance is the leading independent Oracle licensing and negotiation firm, with 500+ engagements across Oracle's full product line.