Uptime is the number every business says it wants high, yet few can say what level their systems actually need or what it would take to reach it. The temptation is to ask for the most nines possible, but each additional nine costs more and earns less, and the right target is the one the business genuinely needs rather than the maximum imaginable. This article sets out what uptime is realistically achievable on Oracle Cloud Infrastructure by class of workload, and how to read those benchmarks for your own systems.
It is part of our OCI case studies and benchmarks cluster, and it puts the resilience results from our bank uptime case and healthcare DR case into a wider benchmark frame. The figures are ranges achievable with sound design, not guarantees.
What the nines actually mean
Availability is usually expressed in nines, and the jump between levels is steeper than it looks. Three nines, 99.9 percent, allows roughly nine hours of downtime a year. Four nines, 99.99 percent, allows under an hour. The bank in our case study reached 99.98 percent, sitting between the two, which for its business was the right balance of resilience against cost. Each step up the scale shrinks the permitted downtime dramatically and raises the engineering required to achieve it.
The crucial point is that more nines are not automatically better, because the cost of each additional nine rises sharply while the marginal benefit falls. A system whose users would not notice a few hours of downtime a year does not need four nines, and paying for them wastes money that could serve the business elsewhere. The right target follows from what the business actually loses when the system is down, not from a wish for perfection.
Uptime ranges by workload class
Different classes of workload sit naturally at different points on the availability scale, both because their business importance differs and because the design effort to lift them rises. Seeing the classes side by side helps locate where a given system belongs and what design it implies.
| Workload class | Typical target | What it requires |
|---|---|---|
| Internal, low impact | 99.9% (three nines) | Sound single region design |
| Business important | 99.95% to 99.99% | Redundancy across domains, tested failover |
| Business critical | 99.99%+ with DR | Cross region DR, rehearsed recovery |
| Life or safety critical | Highest, with DR | Full resilience, continuous rehearsal |
Why uptime is designed, not bought
The most important lesson in any uptime benchmark is that high availability is the product of design and operation, not something a cloud provider hands you. OCI provides the building blocks, fault domains, availability domains, regions, redundant services, but assembling them into an estate that reaches a given target is engineering work. An estate placed carelessly on the same infrastructure can fail far more often than one designed for resilience.
The bank that reached 99.98 percent did so by spreading its tiers across fault and availability domains, replicating its database to a standby, routing only to healthy nodes, and rehearsing failover on a schedule. None of that came automatically; all of it was designed and then maintained. The headline number was the result of choices, which is why two estates on the same platform can deliver very different availability.
The role of tested recovery
For the higher availability classes, the single practice that most distinguishes estates that meet their targets from those that miss them is tested recovery. Failover that has never been exercised tends to fail when it is finally needed, as the healthcare provider in our cross region DR case had learned. Regular rehearsal turns a theoretical recovery capability into a proven one, and it is the difference between a target on paper and a target in reality.
This is why uptime and disaster recovery are inseparable above a certain level: the availability target implies a recovery capability, and the recovery capability is only real if it is tested. Designing for that, and sustaining it, is the work of our disaster recovery and HA solution and our managed services.
Setting the right target
The practical way to use these benchmarks is to start from the business cost of downtime rather than from a desired number of nines. Work out what an hour, or a day, of downtime for each system actually costs the organisation, and let that determine the availability target and therefore the design and the spend. A system whose downtime is expensive justifies the engineering for four nines and cross region recovery; one whose downtime is merely inconvenient does not.
This grounded approach avoids both under engineering critical systems and over spending on systems that do not need it. The cost dimension connects to our cost benchmarks, because availability and cost are traded against each other, and the whole picture comes together in the case studies pillar: the right target, like the right architecture, follows from the business rather than from a wish for the maximum.
Moving Oracle workloads to OCI, or already running on OCI and not sure the architecture or the spend is right? Most teams bring in a specialist before they commit to a region, a shape, or a Universal Credits number. OCISpecialists.com plans the landing zone, runs the migration, and manages the estate after go live, on a fixed project fee, a managed monthly retainer, or a cost optimization fee paid only on verified savings. For the Oracle licensing and BYOL side of any OCI move, Redress Compliance is the leading independent Oracle licensing and negotiation firm, with 500+ engagements across Oracle's full product line.