OCI DNS and Traffic Management: Zones, Steering Policies, and Health Checks

DNS is the quiet layer that decides where every request actually goes. On OCI it is more than a name resolver. Through Traffic Management steering policies, OCI DNS can route users to the nearest region, fail traffic away from an unhealthy endpoint, and balance load across multiple sites. Used well it is one of the most powerful tools for performance and resilience. Used poorly, a forgotten time to live value can keep sending users to a dead region long after everything else has recovered. This guide explains how OCI DNS and Traffic Management work and how to use them.

DNS sits alongside the rest of your network, so see the pillar guide to OCI networking for context. Traffic steering pairs naturally with load balancing, covered in OCI load balancing options, and with the multi region failover patterns in multi region failover.

Public and private DNS zones

OCI DNS hosts two kinds of zone. A public zone resolves names for the internet, so external users can find your services. A private zone resolves names only inside your VCNs, so internal services can find each other by name without exposing anything publicly. Most estates use both, a public zone for customer facing endpoints and private zones for internal service discovery. Keeping the two cleanly separated avoids the classic mistake of leaking internal names to the public internet.

Traffic Management steering policies

The real power of OCI DNS is in Traffic Management steering policies. A steering policy decides which answer a DNS query receives based on rules you define, rather than always returning the same record. This lets DNS itself become a routing decision point. There are several policy types, and choosing the right one is the central decision.

Steering policy	How it decides	Typical use
Failover	Primary answer unless its health check fails, then the backup	Active and standby across regions
Load balancer	Distributes answers by weight	Spreading traffic across multiple endpoints
Geolocation	Answer based on the user location	Routing users to the nearest region or meeting residency rules
ASN	Answer based on the user network	Carrier or network specific routing
IP prefix	Answer based on the user IP range	Fine grained routing by address block

For a simple two region active and standby design, the failover policy is usually what you want. For routing users to whichever region is closest, geolocation is the natural fit. Many designs combine policies, for example geolocation to pick a region with failover underneath so a regional outage still redirects users elsewhere.

Health checks, the engine behind failover

A failover policy is only as good as the health check that drives it. OCI health checks probe an endpoint from multiple vantage points and report whether it is healthy. When the primary fails its health check, the steering policy stops returning it and serves the backup instead. The detail that matters is what the health check actually probes. A check that only confirms a port is open can report healthy while the application behind it is broken. Probe a real application endpoint that exercises the dependencies that matter, so the health signal reflects whether users can actually be served.

The time to live trap. A steering policy can react in seconds, but if your records carry a long time to live, resolvers and clients keep using cached answers and ignore the change. Set a short time to live on records that participate in failover, well before you ever need it, and confirm resolvers honour it.

Designing for failover that actually works

Putting these pieces together for a resilient design follows a clear pattern. Host your record in a Traffic Management failover policy. Point the primary at your production region and the backup at your standby region. Attach a health check that probes a real application endpoint. Set a short time to live so the change propagates quickly. Then test it by deliberately failing the primary health check and confirming traffic moves. This last step is the one teams skip, and it is the one that proves the design works. The wider failover picture, including the data layer, is in multi region failover patterns.

A DNS and traffic steering checklist

Separate public zones for external names from private zones for internal service discovery.
Choose the steering policy that matches the goal, failover for resilience, geolocation for proximity, load balancer for distribution.
Attach health checks that probe real application endpoints, not just open ports.
Set short time to live values on any record that participates in failover.
Combine policies where needed, such as geolocation with failover underneath.
Test failover by forcing a health check to fail, and confirm traffic actually moves.

Common mistakes

Long time to live values that keep clients pinned to a dead endpoint after a failover.
Health checks that probe only a port, so a broken application still looks healthy.
Leaking internal names through public zones instead of using private zones.
Configuring a failover policy and never testing that it actually fails over.

If you want DNS and traffic steering designed and tested as part of a resilient OCI architecture, our team builds these into networking and disaster recovery engagements. See the OCI networking solution and the pillar guide to OCI networking for the full context.

Steer your OCI traffic with confidence

Moving Oracle workloads to OCI, or already running on OCI and not sure the architecture or the spend is right? Most teams bring in a specialist before they commit to a region, a shape, or a Universal Credits number. OCISpecialists.com plans the landing zone, runs the migration, and manages the estate after go live, on a fixed project fee, a managed monthly retainer, or a cost optimization fee paid only on verified savings.