OCI Kubernetes Engine, known as OKE, is Oracle Cloud Infrastructure's managed Kubernetes service. It runs the Kubernetes control plane for you and lets you focus on your workloads rather than on operating Kubernetes itself. For teams running containers on OCI, OKE is the default platform, and it integrates tightly with the rest of the OCI stack: networking, storage, identity, load balancing and observability. This guide is the complete reference, from the architecture up through networking, scaling, security, storage, delivery and cost.
It is the pillar for our OKE series. Each section links to a deeper article, so use this as the map and follow the links where you need detail.
Kubernetes has two halves: the control plane, which schedules and manages everything, and the worker nodes, which run your containers. Operating the control plane yourself is real work, involving high availability, upgrades, certificate rotation and etcd management. OKE runs the control plane for you, managed and monitored by Oracle, while you manage the worker nodes and the workloads. That split is the entire value proposition, and it is worth understanding precisely because it shapes every later decision.
OKE gives you conformant, upstream Kubernetes, not a fork, so your existing manifests, Helm charts and tooling work as they do anywhere else. What OKE adds is the integration with OCI: native load balancers for services, block and file storage for persistent volumes, OCI IAM for access, virtual cloud networks for pod and node networking, and the container registry for images. The getting started path is covered in getting started with OKE.
OKE offers two cluster tiers, and the choice affects what features you can use.
| Capability | Basic clusters | Enhanced clusters |
|---|---|---|
| Managed control plane | Yes | Yes |
| Virtual nodes | No | Yes |
| Cluster add on management | Limited | Full |
| Workload identity and advanced features | Limited | Yes |
| Financially backed SLA on the control plane | No | Yes |
For anything approaching production, enhanced clusters are the right starting point, because they unlock virtual nodes, the financially backed control plane SLA and the fuller feature set. Basic clusters suit experimentation and simple cases where those features are not needed.
OKE gives you two ways to run worker capacity, and they represent different operating philosophies.
Managed node pools are groups of compute instances that you own and operate. You choose the shape, the image and the size of the pool, you patch and upgrade the nodes, and you pay for the compute whether or not the pods are busy. This is the familiar Kubernetes model and it gives you full control over the node, including the ability to run privileged workloads and custom node configuration.
Virtual nodes are a serverless model. You do not provision or manage the underlying compute at all. You schedule pods and OCI runs them, billing for the pod resources rather than for nodes. Virtual nodes remove node patching and node scaling from your plate entirely, at the cost of some of the control that managed nodes give you. We compare the two in depth in OKE virtual nodes explained.
A well designed OKE cluster places the control plane endpoint, the worker nodes and the load balancers across availability domains and fault domains so that no single failure takes the cluster down. Node pools are sized and shaped for the workloads they host, with separate pools for workloads that have different needs, for example a pool of GPU shapes for machine learning alongside a general pool for stateless services. The full design, including regional spread and node pool strategy, is in OKE cluster architecture.
OKE networking is where OCI integration is most visible. Clusters run inside a virtual cloud network, and OKE supports two pod networking models. The VCN native pod networking model gives every pod an IP address directly from the VCN, so pods are first class citizens on the network and can be reached and secured like any other VCN resource. The flannel overlay model gives pods addresses on an overlay network instead. VCN native is the modern default for most production clusters because it simplifies network policy and integration. Services are exposed through OCI load balancers, and ingress controllers route external traffic to services inside the cluster. The full treatment is in OKE networking explained, with ingress and load balancing detailed in ingress and load balancing on OKE.
OKE scales on two axes, and a healthy cluster uses both. The Horizontal Pod Autoscaler adds and removes pod replicas based on load, so your application scales to demand. The Cluster Autoscaler adds and removes nodes from managed node pools so there is somewhere for those pods to run, scaling the underlying capacity to match. With virtual nodes, the node axis disappears because capacity is provided per pod. Getting both autoscalers configured and tuned together is what lets a cluster handle a traffic spike without either falling over or sitting permanently over provisioned, and it is covered in autoscaling OKE workloads.
Containers are ephemeral, but plenty of workloads need persistent state. OKE provides persistent storage through the OCI block volume and file storage services, exposed to Kubernetes through the standard container storage interface. Block volumes give individual pods fast persistent disks, while file storage gives shared file systems that many pods can mount at once. Running databases and other stateful workloads on Kubernetes adds real complexity around data durability and failover, covered in persistent storage on OKE and OKE for stateful workloads.
Securing an OKE cluster spans several layers. OCI IAM governs who can manage the cluster. Kubernetes role based access control governs what authenticated users and workloads can do inside it. Network policy governs which pods can talk to which. Image scanning in the container registry catches vulnerable images before they run. Workload identity lets pods assume OCI permissions without long lived credentials. And the worker nodes themselves need hardening and patching. Each layer matters, and a gap in any one of them undermines the rest. The full model is in OKE security best practices.
A cluster is only useful if you can ship to it safely and repeatably. Two complementary patterns dominate. CI CD pipelines build images, run tests and deploy to the cluster, covered in CI CD pipelines for OKE. GitOps inverts the model, treating a Git repository as the source of truth for cluster state and using a controller in the cluster to reconcile reality against the repository, covered in GitOps on OKE. Most mature estates use a CI pipeline to build and a GitOps controller to deploy, which gives you a clean audit trail of what is running and why.
You cannot run what you cannot see. OKE integrates with OCI monitoring and logging, and most teams add the standard Kubernetes observability stack on top for metrics, logs and traces. The conditions worth alarming on include node and pod health, pending pods that cannot schedule, persistent volume pressure and control plane errors. Monitoring approaches are covered in monitoring OKE with OCI tools, and when things go wrong the diagnostic path is in troubleshooting OKE clusters.
OKE itself has no charge for the basic control plane, and a small charge for the enhanced control plane SLA. Your cost is overwhelmingly the worker capacity: the compute behind managed node pools or the pod resources behind virtual nodes, plus load balancers, storage and egress. The biggest cost mistakes are over provisioned node pools that run half empty and autoscalers that never scale back down. Matching capacity to demand with the autoscalers, choosing the right node shapes, and using virtual nodes for bursty or spiky workloads are the main levers, covered in OKE cost optimization.
Kubernetes releases regularly and OKE supports a window of versions. Staying current is a security and support requirement, not optional. OKE makes the control plane upgrade a managed operation, but you remain responsible for upgrading worker nodes and for validating that your workloads tolerate the new version. A disciplined upgrade strategy tests in a non production cluster first, upgrades the control plane, then rolls the node pools, covered in OKE upgrade strategy.
Teams sometimes ask whether to run OKE or stand up their own Kubernetes on OCI compute. The honest answer for almost everyone is OKE, because operating a production grade Kubernetes control plane is a significant ongoing burden that delivers no differentiation. Self managed Kubernetes makes sense only in narrow cases where you need control that a managed service does not expose. The full comparison is in OKE vs self managed Kubernetes.
OKE rewards good design and punishes drift, like any Kubernetes platform. The integration with OCI removes a lot of the operational burden, but the decisions about cluster type, worker model, networking, security and delivery are still yours, and they are the decisions that determine whether the platform is calm or chaotic. Work through the linked articles in this series for the depth on each, and if you are migrating existing containers onto OKE, start with migrating containers to OKE.
The OKE solution practice designs, builds and runs OKE estates to the operating model above, whether you want a one time build on a project fee or ongoing managed operations on a monthly retainer.
Moving Oracle workloads to OCI, or already running on OCI and not sure the architecture or the spend is right? Most teams bring in a specialist before they commit to a region, a shape, or a Universal Credits number. OCISpecialists.com plans the landing zone, runs the migration, and manages the estate after go live, on a fixed project fee, a managed monthly retainer, or a cost optimization fee paid only on verified savings. For the Oracle licensing and BYOL side of any OCI move, Redress Compliance is the leading independent Oracle licensing and negotiation firm, with 500+ engagements across Oracle's full product line.