From Pilot to Production on OCI

A pilot proves that something can work on OCI. Production proves it can be relied on. The gap between the two is where many OCI projects stall, because the things that made the pilot quick, loose permissions, manual setup, a single instance, no real monitoring, are exactly the things that make it unfit for production. This guide sets out what changes between a pilot and a production deployment on OCI, so the move from one to the other is a planned transition rather than a series of incidents discovered live.

Why pilots stall before production

The qualities that make a good pilot are the opposite of the qualities that make a good production system. A pilot is built to answer a question fast, so it cuts corners deliberately. The trouble starts when the pilot works, stakeholders are pleased, and there is pressure to simply promote it as is. Promoting a pilot without hardening it is how estates end up with wide open security rules, single points of failure and no recovery plan in production. Recognising that production is a different build, not a relabelled pilot, is the first step.

A pilot is built to answer a question. Production is built to be trusted at three in the morning. They are not the same system.

What actually changes

The table below captures the dimensions that change when a workload moves from pilot to production. Each row is something that was reasonable to skip in a pilot and is essential to address before real users and real data depend on the system.

Dimension	Pilot	Production
Access	Broad permissions for speed	Least privilege IAM, MFA, audited
Availability	Single instance, one domain	Spread across fault and availability domains
Recovery	None or untested	Backups verified, DR tested, RTO and RPO defined
Provisioning	Manual clicks in the console	Infrastructure as code, reproducible
Monitoring	Minimal or none	Metrics, logs, alerts with owners
Cost	Whatever was fastest	Right sized, tagged, governed

Hardening identity and network

The first hardening pass is access. Replace the broad permissions that made the pilot quick with least privilege IAM policies, enforce multi factor authentication, and remove the keys and users created for convenience. The network gets the same treatment, tightening security lists and network security groups to exactly what the workload needs, moving backends into private subnets and exposing only what must be public. These steps map directly to the controls in our guide to designing OCI for compliance, and they are far easier to apply as a deliberate pass than to retrofit after an incident.

Building in availability and recovery

A pilot on a single instance in a single domain is fine for a demo and unacceptable for production. The production build spreads critical tiers across fault domains and, where the workload justifies it, across availability domains, so that the failure of one does not take the service down. Equally important is recovery, which means backups that are not just configured but verified by an actual restore, and a disaster recovery approach with defined recovery time and recovery point objectives that have been tested. The full reasoning is in our guide to designing for high availability on OCI.

From clicks to code

Pilots are usually built by hand in the console, which is fine for a one off but a liability for production, because a hand built environment cannot be reliably reproduced, reviewed or rebuilt after a disaster. The production transition is the right moment to capture the environment as infrastructure as code, so it can be version controlled, peer reviewed and recreated on demand. This also makes future changes safer, because they go through the same reviewed pipeline rather than ad hoc edits, and it connects to the operating discipline in our scaling patterns guide.

A pilot to production checklist

Lock down access. Replace broad permissions with least privilege IAM, enforce MFA, remove convenience credentials.
Tighten the network. Narrow security rules, move backends to private subnets, expose only what must be public.
Engineer availability. Spread critical tiers across fault and availability domains.
Prove recovery. Verify backups with a real restore and test the disaster recovery plan against defined objectives.
Codify the environment. Capture infrastructure as code so it is reproducible and reviewable.
Instrument and right size. Add monitoring with owned alerts, and right size and tag resources for cost control.

Performance and load testing

A pilot is rarely tested under realistic load, because the point of a pilot is to prove the concept, not to find its breaking point. Production has to survive real traffic, including the peaks, so load testing becomes a required step in the transition rather than an optional one. The work is to model the expected load, including the busiest realistic scenario, run it against a production like environment, and observe where the system slows or fails. This is how you discover that a shape is undersized, a database connection pool is too small, or a service falls over under concurrency, while it is still cheap to fix rather than during a launch.

Load testing also validates the scaling behaviour you are relying on. If the design assumes autoscaling will absorb a traffic spike, the only way to know it actually does, and fast enough, is to generate the spike and watch. Testing the scaling response before launch turns an assumption into evidence, and it connects directly to the capacity thinking in our scaling patterns guide. A production system whose limits are known is far less stressful to operate than one whose limits will be discovered live.

Runbooks and operational readiness

Production is not just a technical state, it is an operational one, and a system is not ready for production until someone knows how to run it. That readiness lives in runbooks, the documented procedures for the things that will eventually need doing, restarting a service, failing over to a standby, restoring from backup, responding to a specific alert. A pilot has none of these because it has never needed them. The transition is when they get written, ideally by the people who will be on call, and tested by walking through them rather than assuming they work.

Operational readiness also means the alerts actually reach a human who can act, the on call rotation exists, and the escalation path is clear. A monitoring setup that fires into an unwatched channel is not operational readiness, it is the appearance of it. Getting this right is part of why the move to production benefits from the discipline of ongoing operations, the same discipline our managed services provide for estates that would rather not build an on call function from scratch.

Stakeholder sign off and go live

The final difference between a pilot and production is accountability. A pilot can be turned off without consequence, while a production system carries commitments to the business and its users, so the transition should include an explicit sign off that the hardening, availability, recovery, testing and operational readiness are all in place. This is not bureaucracy for its own sake, it is the moment where the people responsible confirm that the known risks have been addressed and the residual ones are understood and accepted.

A well run go live is calm precisely because everything that could be prepared has been, and the launch itself is a small step rather than a leap. The cutover is rehearsed, the rollback is ready, the monitoring is watched, and the team knows what to do if something behaves unexpectedly. That calm is the payoff for treating the transition as the deliberate piece of work it is, the foundations for which sit in our complete architecture guide.

Treating the transition as a project

The move from pilot to production deserves to be planned and resourced as a project in its own right, not assumed to be a quick promotion. When it is treated that way, the hardening, availability, recovery, automation and monitoring work gets done deliberately and the production launch is calm. When it is not, those same items get discovered one painful incident at a time. The foundations sit in our complete architecture guide. When you want the production build done properly, with the estate hardened and ready to run, our implementation and migration service and ongoing managed services take a working pilot all the way to a system you can trust.

Free white paper

Go deeper on this topic with The OCI Landing Zone and Architecture Guide, a reference architecture for security, networking, and governance on OCI. An independent analyst style report with comparison tables and recommendations, free with a work email. Prefer a monthly summary instead? The OCI Brief delivers one practical OCI briefing a month.

Part of a series
This guide is part of OCI Migration — our complete pillar guide on the topic.

About the author

Morten Andersen, Co-founder of OCI Specialists — 20 years of enterprise IT experience in OCI migration, security, networking, and 24/7 operations. Full profile · LinkedIn

Moving Oracle workloads to OCI, or already running on OCI and not sure the architecture or the spend is right? Most teams bring in a specialist before they commit to a region, a shape, or a Universal Credits number. OCISpecialists.com plans the landing zone, runs the migration, and manages the estate after go live, on a fixed project fee, a managed monthly retainer, or a cost optimization fee paid only on verified savings.