Data Architecture on OCI | OCI Specialists

Most data problems on OCI are not capacity problems, they are placement problems. Teams put transactional and analytical work in the same database, keep cold data on expensive block storage, move data between services through ad hoc scripts, and then wonder why performance is uneven and the bill is high. A clear data architecture decides up front where each kind of data lives, how it moves and who can see it. This guide lays out how to make those choices on Oracle Cloud Infrastructure.

Map data to the right store

OCI offers a deliberately wide range of data services, and the first architecture decision is matching each workload to the store designed for it rather than defaulting to one database for everything. Transactional systems want low latency reads and writes and strong consistency. Analytical systems want to scan large volumes fast. Document and key value workloads want flexible schemas. Files and backups want cheap, durable object storage. Treating these as one problem leads to a single oversized database doing everything badly.

Data type	OCI store	Why
OLTP, high concurrency	Autonomous Transaction Processing, MySQL HeatWave	Low latency, strong consistency
Analytics, warehousing	Autonomous Data Warehouse, Exadata	Fast large scans, columnar processing
Documents, flexible schema	Autonomous JSON, NoSQL Database	Schema on read, simple scaling
Files, backups, media	Object Storage, Archive Storage	Cheap, durable, tiered
Low latency caching	In memory tiers, application caches	Sub millisecond reads

Separate transactional and analytical paths

Running heavy reporting queries against the same database that serves live transactions is the classic cause of unpredictable latency. The architecture answer is to separate the paths. Keep the system of record lean and transactional, and feed an analytical store through replication, change data capture or scheduled extracts. On OCI this often means an Autonomous Transaction Processing instance for the live system and an Autonomous Data Warehouse for analytics, with a defined movement layer between them. The two scale independently and neither starves the other.

A reporting query and a checkout transaction should never compete for the same database. Separate the paths and let each store do one job well.

Data movement and integration

Once data lives in more than one store, movement becomes its own architecture concern. The choices range from native database features such as Data Pump and GoldenGate for replication, to OCI Data Integration for managed pipelines, to streaming through OCI Streaming for event driven flows. The principle is to make movement explicit and observable rather than burying it in cron jobs that no one owns. Every flow should have a defined source, target, schedule or trigger, and a way to detect when it fails.

For estates that span clouds or keep some data on premises, the movement layer also has to cross those boundaries cleanly. The connectivity patterns that make that reliable are covered in our wider architecture guide and the scaling considerations in scaling patterns on OCI.

Governance, security and lifecycle

A data architecture that ignores governance becomes a liability. Decide who can read and write each store through IAM policies, encrypt data at rest and in transit, classify sensitive data so it can be masked or restricted, and define retention so data does not accumulate forever on premium storage. Lifecycle rules that move cold objects to Archive Storage and expire data past its retention window are some of the highest return, lowest effort changes you can make, and they sit naturally alongside the controls described in the OCI Well Architected Framework.

A layered data architecture

Ingestion. Define how data enters, through application writes, streaming, batch loads or replication, and make each path observable.
System of record. Keep transactional stores lean and authoritative, one owner per dataset.
Analytical layer. Feed a separate warehouse or lake for reporting, scaled independently of the live systems.
Serving and caching. Add caches and read replicas where latency demands it, never as a substitute for the right store.
Archive and lifecycle. Tier cold data to cheaper storage and expire it on a defined schedule.
Governance across all layers. Apply consistent access control, encryption, classification and audit everywhere.

Choosing between MySQL HeatWave and Autonomous

For transactional and mixed workloads, OCI gives you more than one credible relational option, and the choice is worth making consciously. Autonomous Database brings the full Oracle Database feature set with automated operations, which suits workloads that already depend on Oracle features or that benefit most from hands off management. MySQL HeatWave brings a MySQL compatible engine with an in memory analytics accelerator built in, which suits teams standardised on MySQL who want analytics on the same data without a separate warehouse. The decision usually follows the application and the team, an Oracle centric estate leans Autonomous, a MySQL centric estate leans HeatWave, and the data architecture should reflect that rather than forcing a single answer everywhere.

What matters architecturally is not picking a winner but placing each workload on the engine that fits it and being deliberate about where the boundaries lie. A data architecture that uses Autonomous for the Oracle workloads and HeatWave for the MySQL ones is perfectly coherent, provided the movement and governance layers treat both consistently.

Object Storage and the data lake

Object Storage is the quiet workhorse of most OCI data architectures, and it deserves more architectural attention than it usually gets. Beyond backups and files, it is the natural foundation for a data lake, holding raw and semi structured data cheaply and durably until it is processed. Autonomous Database can query data directly in Object Storage through external tables, which means you can keep large, infrequently accessed datasets out of premium database storage and bring them into the database only when needed. This pattern, often called the lakehouse, lets you separate cheap durable storage from expensive query compute and pay for each independently.

The architecture decision is what belongs in the lake and what belongs in the database. Raw ingested data, historical archives and large datasets that are queried occasionally fit the lake. Curated, frequently queried data that needs low latency belongs in the database. Drawing that line deliberately, and tiering data across the layers described in our complete architecture guide, is what keeps a data lake from becoming a data swamp.

Real time and event driven data

Not all data arrives in batches. Increasingly it arrives as a stream of events, and a modern data architecture has to handle that path as a first class concern. OCI Streaming provides a managed, Kafka compatible service for ingesting and processing high volume event data, and it sits naturally in front of both the operational stores and the analytical layer. The pattern is to capture events as they happen, process or enrich them in flight, and land them in the store appropriate for how they will be used, whether that is a database for immediate querying or Object Storage for later analysis.

Event driven data introduces ordering, delivery and replay concerns that batch processing does not, and designing for them early avoids painful retrofits. The questions to settle are whether processing must be exactly once or at least once, how long events are retained for replay, and how downstream consumers cope when they fall behind. Treating the streaming layer with the same rigour as the database layer is what makes real time data reliable rather than merely fast.

Common anti patterns

Three patterns cause most of the pain we see. The first is the universal database, one instance asked to serve transactions, analytics and document workloads at once, which guarantees that none of them is fast. The second is invisible data movement, where nightly scripts copy data around with no monitoring until one silently fails and reports go stale. The third is storage drift, where data lands on premium block or standard object storage and never moves, so the bill grows with every passing month. Naming these early, and designing against them, is most of what good data architecture on OCI involves.

Where to go next

A sound data architecture is the backbone of a healthy OCI estate, and it touches availability, cost and security all at once. Pair this with our notes on designing for high availability on OCI, and when you want the stores chosen, the movement layer built and the governance set correctly, our Autonomous Database solution and OCI Storage solution cover the work end to end.

Free white paper

Go deeper on this topic with The OCI Landing Zone and Architecture Guide, a reference architecture for security, networking, and governance on OCI. An independent analyst style report with comparison tables and recommendations, free with a work email. Prefer a monthly summary instead? The OCI Brief delivers one practical OCI briefing a month.

Part of a series
This guide is part of Data & AI on OCI — our complete pillar guide on the topic.

About the author

Fredrik Filipsson, Co-founder of OCI Specialists — 20 years of enterprise IT experience in Oracle Database, OCI cost optimization, licensing, and data platforms. Full profile · LinkedIn

Moving Oracle workloads to OCI, or already running on OCI and not sure the architecture or the spend is right? Most teams bring in a specialist before they commit to a region, a shape, or a Universal Credits number. OCISpecialists.com plans the landing zone, runs the migration, and manages the estate after go live, on a fixed project fee, a managed monthly retainer, or a cost optimization fee paid only on verified savings.