OCI Service Continuity Planning

Ask a business whether its important systems should survive a failure and the answer is always yes. Ask how long they can be down, how much data they can afford to lose, and what they are willing to pay to guarantee it, and the conversation gets much harder. Service continuity planning is the discipline of having that harder conversation properly, translating the universal wish to stay up into specific recovery objectives, a design that actually meets them, and a tested confidence that the design works when the bad day comes. Without it, continuity is just a hope, and hope performs poorly under failure.

The two numbers that define continuity

Continuity planning rests on two objectives that every important workload needs defined, because everything else follows from them. The recovery time objective is how long the system can be unavailable before the impact becomes unacceptable. The recovery point objective is how much data, measured in time, the business can afford to lose. These two numbers are not technical preferences, they are business decisions about tolerance for downtime and data loss, and they should be set by the people who bear the consequences, not assumed by the people building the system.

Objective	Question it answers	What a tight value demands
Recovery time objective	How long can we be down?	Fast failover, standby capacity, automation
Recovery point objective	How much data can we lose?	Frequent or continuous replication

The reason these numbers matter so much is that they drive the cost. A recovery time objective of seconds and a recovery point objective of zero demand a fully redundant, continuously replicated design that is expensive to build and run. A recovery time of a day and a recovery point of hours can be met far more cheaply. Setting these objectives realistically, rather than defaulting to the tightest possible values out of caution, is the single most important cost decision in continuity planning.

Recovery objectives are not technical preferences. They are business decisions about tolerance for downtime and data loss, and they should be set by the people who bear the consequences.

Continuity is not the same as backup

A common and dangerous confusion is treating backup as continuity. They are related but different, and conflating them leaves a gap exactly where it hurts. A backup is a copy of data you can restore from, which protects against data loss. Continuity is about keeping the service running or restoring it within the recovery time objective, which is a broader problem involving compute, networking, configuration and process, not just data. You can have perfect backups and still face a long outage if you have no plan for standing the service back up quickly. Backup is a necessary component of continuity, covered in depth in backup and recovery management on OCI, but it is only one part, and a continuity plan that stops at backup is incomplete.

Designing for the failures that matter

Good continuity design starts by being honest about what you are protecting against, because different failures need different defences and protecting against all of them equally is wasteful. A single component failure is common and should be handled gracefully by redundancy within the design. A larger failure affecting a whole availability domain is rarer and needs the workload spread across domains. A regional event is rarer still and only the most critical workloads justify the cost of cross region protection against it. Matching the design to the failures that realistically matter for a given workload, rather than defending every workload against every conceivable disaster, is what keeps continuity affordable while still meeting the objectives that count.

A framework for a continuity plan

A continuity plan is more than a design, it is a complete answer to the question of what happens when things fail. The framework below covers what a real plan contains.

Define objectives. Set the recovery time and recovery point objectives per workload, agreed with the business.
Map dependencies. Identify everything a workload needs to function, since a recovered application is useless without the services it depends on.
Design to the objectives. Build the redundancy, replication and failover that meet the objectives for each workload, no more and no less.
Document the procedure. Write down exactly how recovery happens, step by step, so it does not depend on the right person being available.
Test it for real. Exercise the plan against actual failure scenarios and confirm it meets the objectives, then fix what did not work.

The dependency trap

One of the most common reasons continuity plans fail in practice is overlooked dependencies. A workload is recovered successfully, and then it sits there useless because something it quietly depended on, an authentication service, a shared database, a network path, a name resolution service, was not part of the plan and is still down. Mapping the full dependency chain of each critical workload, and making sure everything it needs is itself covered by the continuity plan, is unglamorous work that prevents the nasty surprise of a recovery that technically succeeded but practically failed. The systems that matter are rarely standalone, and their continuity is only as good as the continuity of everything beneath them.

The plan you have not tested does not exist

The hardest truth in continuity planning is that an untested plan is not a plan, it is a document. Plans that look complete on paper routinely fail when exercised, because of an assumption that did not hold, a procedure step that was wrong, a dependency that was missed, or a recovery that took far longer than the objective allowed. The only way to know a continuity plan works is to test it against realistic failure scenarios, deliberately, in a controlled way, before the real failure forces the test on you. Regular testing also keeps the plan current, since an estate changes over time and a plan that was correct a year ago may no longer match reality. This testing discipline is part of what makes proactive operations trustworthy, because confidence that you can recover is only justified if you have proven it.

Why continuity belongs in managed services

Continuity planning is easy to write once and then quietly let decay, because the plan sits unused until the rare day it is needed, and the testing that keeps it valid is effort spent on something that may never happen. That is exactly the kind of important but non urgent work that gets postponed under pressure and is well suited to a managed service, which keeps the plan current and tested as a matter of routine rather than letting it rot. The recovery objectives a continuity plan promises are also closely tied to the commitments in service level agreements, since an availability promise is only credible if the continuity behind it is real and tested. For the full operational picture see the complete guide to OCI managed services. When you want continuity that is planned, tested and kept current rather than written once and forgotten, our OCI managed services practice maintains it as part of the service.

Free white paper

Go deeper on this topic with The OCI Managed Services and Observability Handbook, what good looks like when you run an OCI estate. An independent analyst style report with comparison tables and recommendations, free with a work email. Prefer a monthly summary instead? The OCI Brief delivers one practical OCI briefing a month.

About the author

Fredrik Filipsson, Co-founder of OCI Specialists — 20 years of enterprise IT experience in Oracle Database, OCI cost optimization, licensing, and data platforms. Full profile · LinkedIn

Moving Oracle workloads to OCI, or already running on OCI and not sure the architecture or the spend is right? Most teams bring in a specialist before they commit to a region, a shape, or a Universal Credits number. OCISpecialists.com plans the landing zone, runs the migration, and manages the estate after go live, on a fixed project fee, a managed monthly retainer, or a cost optimization fee paid only on verified savings.