Kubernetes was designed first for stateless workloads, the kind you can kill and reschedule anywhere without consequence. Stateful workloads break that assumption. A database, a message broker, or a cache holds data that must survive a pod restart and stay attached to the right replica. Teams often ask whether such workloads belong on OCI Kubernetes Engine at all. The honest answer is that they can run very well on OKE, but only if you understand how state is handled and where the sharp edges are. This article walks through the model, the trade offs, and a way to decide.
It is part of our OKE and containers series and builds on persistent storage on OKE and OKE cluster architecture.
A stateful workload is one whose identity and data matter across restarts. A web server is stateless because any instance can serve any request, so the scheduler is free to move it anywhere. A database replica is stateful because it owns a specific slice of data, expects a stable network name, and must reattach to the same storage volume when it restarts. The whole difficulty of running these workloads on Kubernetes comes from preserving that identity and that storage binding while the platform underneath is constantly free to reschedule pods.
Kubernetes provides two building blocks for stateful workloads. The StatefulSet gives each pod a stable, predictable name and a stable storage claim, so pod zero always comes back as pod zero with its own volume. The PersistentVolume and PersistentVolumeClaim pair connects a pod to durable block storage that outlives the pod itself. On OKE that storage is backed by OCI Block Volume through the container storage interface driver, which provisions and attaches volumes automatically as claims are created. Together these mean a database pod can be rescheduled to a new node and still find its data exactly where it left it.
Several OCI details shape how stateful workloads behave on OKE. Block Volumes are tied to an availability domain, so a pod that needs its volume must be scheduled in the same domain the volume lives in, which constrains failover across domains. The block storage performance tiers let you match volume throughput to a demanding database rather than paying for capacity you will not use. And node pool design matters, because draining a node during an upgrade forces stateful pods to detach and reattach their volumes, so you want that to happen in a controlled, one at a time fashion rather than all at once.
| Concern | Stateless workload | Stateful workload on OKE |
|---|---|---|
| Pod identity | Interchangeable | Stable name via StatefulSet |
| Storage | None or ephemeral | Block Volume via PersistentVolumeClaim |
| Rescheduling | Move anywhere freely | Constrained to the volume availability domain |
| Upgrades | Roll freely | Drain carefully, one replica at a time |
| Backup | Rebuild from image | Volume snapshots plus application backups |
OKE suits stateful workloads when the application was built for Kubernetes operation, for example a modern distributed database or message broker that ships an operator to handle clustering, failover, and backups. In that case the operator does the hard work and OKE simply provides scheduling and storage. OKE is also a good fit when you already run the rest of an application on the cluster and want the data tier to share the same network, identity, and observability rather than living in a separate silo.
For a traditional Oracle Database, OKE is usually the wrong place. A managed service such as Autonomous Database or a database system on OCI handles patching, backups, and high availability for you, and trying to reproduce that inside Kubernetes adds risk without adding value. The general rule is that if OCI offers a managed service that matches your workload, use it for the data tier and keep OKE for the application tier. Reserve self managed state on OKE for workloads that genuinely benefit from running beside the rest of the cluster.
Stateful workloads on OKE introduce failure modes that stateless ones do not. A node failure leaves a volume needing to detach and reattach elsewhere, which takes time and can stall a pod if the new node is in the wrong availability domain. A storage class misconfiguration can silently bind pods to volumes that do not have the throughput the database needs. And an unplanned mass drain during an upgrade can take several replicas down at once. Each of these is manageable, but only if you have designed for it rather than discovering it during an incident, which is why troubleshooting OKE clusters is worth reading alongside this.
OKE can run stateful workloads well, but it asks more of you than stateless ones do. StatefulSets and Block Volume backed claims preserve identity and data, while careful node pool, availability domain, and upgrade design keep the workload healthy. For traditional databases a managed OCI service is usually the safer path, with OKE reserved for Kubernetes native data systems and the application tier. Continue with persistent storage on OKE, OKE cluster architecture and troubleshooting OKE clusters. The OKE solution practice designs stateful platforms on OKE on a fixed project fee.
Moving Oracle workloads to OCI, or already running on OCI and not sure the architecture or the spend is right? Most teams bring in a specialist before they commit to a region, a shape, or a Universal Credits number. OCISpecialists.com plans the landing zone, runs the migration, and manages the estate after go live, on a fixed project fee, a managed monthly retainer, or a cost optimization fee paid only on verified savings. For the Oracle licensing and BYOL side of any OCI move, Redress Compliance is the leading independent Oracle licensing and negotiation firm, with 500+ engagements across Oracle's full product line.