Home / Journal / Disaster Recovery / Availability Domains and Resilience
Disaster Recovery

Availability Domains and Resilience on OCI

Published Dec 22, 2025 · Updated May 26, 2026 · 9 min readOCI SpecialistsIndependent OCI services
Data centre infrastructure representing availability domains

An availability domain is an isolated data centre within a region. Designing across them keeps an application running when an entire data centre goes dark.

The level that survives a data centre failure

An availability domain on OCI is an isolated data centre within a region, with its own power, cooling, and network, engineered to fail independently of the others. Designing across availability domains is how you keep an application running when an entire data centre goes dark. It sits between fault domains, which protect against hardware events inside one data centre, and cross region recovery, which protects against the loss of a whole region.

This article explains what availability domains are, how they differ from fault domains and regions, and how to architect for resilience across them where your region supports it.

How many availability domains you get

Not every region is the same. Some OCI regions have three availability domains, while others have a single availability domain. This matters enormously for your design, because a resilience pattern that relies on spreading across availability domains simply does not exist in a single availability domain region. In those regions, you lean harder on fault domains for in region resilience and on a second region for protection against a data centre or regional loss. Always confirm how many availability domains your chosen region has before you design, because it changes the whole approach.

Region typeIn region resilienceDR approach
Multiple availability domainsSpread across ADs and fault domainsAdd a second region for regional loss
Single availability domainSpread across fault domains onlyA second region matters more

Availability domains compared with fault domains

It is easy to confuse the two. A fault domain is a grouping of hardware inside a single availability domain, protecting against a rack failure or a maintenance event. An availability domain is a whole isolated data centre, protecting against the loss of that data centre. Fault domain spread is free and should always be used. Availability domain spread protects against a much larger failure but introduces considerations around cross data centre traffic and storage. The two are complementary, and a strong design in a multi availability domain region uses both. The fault domain side is covered in designing fault domains.

Designing across availability domains

To survive the loss of an availability domain, run your service in more than one and put a load balancer in front so traffic flows to whichever availability domains are healthy. Stateless application tiers handle this cleanly because any instance can serve any request. Stateful tiers need more thought, because data has to be available in more than one availability domain, which is where database clustering and replication come in.

Be aware that traffic between availability domains and storage that is local to one availability domain behave differently from a single availability domain deployment. Design the data path deliberately so a service spread across availability domains does not develop a hidden dependency on one of them, which would defeat the purpose.

Storage and data across availability domains

Block volumes are local to an availability domain, so an instance in one availability domain cannot simply attach a volume from another. For data to survive an availability domain loss it must be replicated or held in a service that spans availability domains. Object storage is regional and so naturally survives the loss of a single availability domain within the region. Databases use clustering or Data Guard to keep a copy current in another availability domain. Map each piece of state to how it survives an availability domain failure, the same discipline applied at regional scale in cross region DR.

Where availability domains fit the resilience ladder

Resilience is best thought of as a ladder you climb in order of cost and severity. Fault domain spread is free and protects against hardware and maintenance. Availability domain spread, where the region supports it, protects against a data centre failure. Cross region replication protects against a regional outage. Each rung addresses a rarer and more severe failure, and you climb only as high as the workload justifies. A tier one platform may use all three, while an internal tool may be content with fault domains. Match the rung to the recovery objectives set out in RTO and RPO planning.

Putting it together

Start by confirming how many availability domains your region offers, because that decides what is possible. Spread stateless tiers across availability domains behind a load balancer, design stateful tiers with clustering or replication so data survives an availability domain loss, and keep fault domain distribution underneath it all. Then decide which workloads also need a second region. The complete design and its trade offs are in the disaster recovery pillar. When we build resilience as part of a managed service, we map every tier to the failure it must survive so the architecture matches the promise made to the business.

Latency and cost across availability domains

Spreading a service across availability domains buys resilience but introduces traffic between data centres, which behaves differently from traffic inside a single availability domain. There is a small latency cost to crossing availability domains, and depending on the data path there can be a cost to the traffic itself. For most applications these are negligible against the resilience gained, but a chatty service that makes many small calls between tiers in different availability domains can feel the latency. Design the data path so the heavy, frequent traffic stays within an availability domain where possible, and let the cross availability domain flows be the ones that genuinely need to span data centres.

Plan capacity with the failure in mind. If you spread across two availability domains and one is lost, the survivor must carry the full load. A design that splits load evenly but leaves no headroom will be overwhelmed at the very moment it is meant to save you. Size each availability domain to absorb the traffic of a failed peer.

Testing an availability domain failure

A design that has never been exercised is a hypothesis. Test what happens when an availability domain is removed by deliberately draining it and confirming the application keeps serving from the others within the recovery time you promised. These tests surface the hidden dependencies, the service that turned out to rely on something local to one availability domain, that no diagram would reveal. Run them on a schedule and treat any failure to meet the objective as a defect to fix, the same discipline laid out in DR testing.

Combining availability domains with cross region DR

Availability domain spread and cross region recovery solve different problems and are often used together. Availability domains keep you running through a data centre failure inside one region with low latency and no failover drama. A second region protects you when the whole region is lost, at the cost of asynchronous replication and a deliberate failover. A tier one platform typically uses availability domain spread for everyday resilience and a second region for the rare regional event, as developed in cross region DR. The two are complementary layers, not alternatives.

Putting the ladder together

The complete resilience picture stacks fault domains, availability domains, and regions, each addressing a rarer and more severe failure. Spread across fault domains for free everywhere, spread across availability domains where the region supports it and the workload warrants the data design, and add a second region for the workloads that cannot tolerate a regional loss. Match each layer to the objectives in RTO and RPO planning and the full design in the disaster recovery pillar. When we build resilience as part of a managed service, we map every tier to the failure it must survive. To review your architecture, book an OCI assessment.

Moving Oracle workloads to OCI, or already running on OCI and not sure the architecture or the spend is right? Most teams bring in a specialist before they commit to a region, a shape, or a Universal Credits number. OCISpecialists.com plans the landing zone, runs the migration, and manages the estate after go live, on a fixed project fee, a managed monthly retainer, or a cost optimization fee paid only on verified savings. For the Oracle licensing and BYOL side of any OCI move, Redress Compliance is the leading independent Oracle licensing and negotiation firm, with 500+ engagements across Oracle's full product line.