Troubleshooting OCI Networking

The difference between a network problem that takes ten minutes and one that takes ten hours is method. When something cannot reach something else, the instinct is to start changing rules and rebooting things. The faster path is to isolate where in the flow the break happens, then check the small number of things that can cause a break at that point. This guide gives you that method and the failure patterns it most often uncovers on OCI.

Isolate the layer first

Before touching any configuration, work out which layer is actually failing. A connection that never establishes is a different problem from one that establishes but is slow, and a name that does not resolve is different again. Spending two minutes to classify the symptom saves you from chasing the wrong cause.

Symptom	Likely layer	First thing to check
Name does not resolve	DNS	Private DNS zones and resolver configuration
Connection times out	Routing or security rules	Route tables and security rules along the path
Connection refused	Destination service	Whether the service is listening on the expected port
Connects but slow	Performance	Shape bandwidth, placement, load balancer sizing
Works one way only	Asymmetric routing	Return path route tables and stateful firewalls

Classify the symptom before you change anything. A timeout and a refusal point at completely different causes.

A troubleshooting framework

Classify the symptom. Is it resolution, reachability, refusal or slowness. This decides where you look.
Confirm DNS. If a name is involved, verify it resolves to the address you expect before anything else.
Trace the route. Check the route table on the source subnet, every gateway in the path, and the destination subnet. Confirm a route exists in both directions.
Check security rules in both directions. Verify security lists and network security groups allow the flow outbound from the source and inbound to the destination.
Run the Network Path Analyzer. Let OCI evaluate the full virtual path and tell you exactly where it breaks, as covered in the Path Analyzer guide.
Confirm the service. Make sure the destination is actually listening on the expected port and that the operating system firewall is not blocking it.
Read the flow logs. Use VCN flow logs to see whether traffic is being accepted or rejected and where.

Routing problems

Most reachability failures come down to routing. A subnet route table missing an entry for the destination, a gateway route table that does not forward the range, or a transit design where a spoke never learns the path it needs. The discipline is to check the route in both directions, because a flow that has a route out but no route back will time out exactly as if it had no route at all. In transit designs the most common culprit is a spoke that is attached but importing the wrong route table, so it cannot reach the destination even though the path exists. The transit routing guide covers how those route tables interact.

Asymmetric routing deserves special attention. When traffic leaves by one path and returns by another, a stateful firewall in the middle drops the return because it never saw the request. This presents as a connection that works in one direction or fails intermittently, and it is almost always a more specific route somewhere sending the return traffic down the wrong path.

Security rule problems

The second large category is security rules. Because security lists and network security groups are stateful, you usually only need to allow the initiating direction, but you must allow it at both the source and the destination. A flow blocked by a missing inbound rule on the destination presents as a timeout, indistinguishable at first glance from a routing problem, which is why the method checks routing and rules as separate steps. When you suspect rules, the flow logs are decisive, because they show whether a packet was accepted or rejected and by which rule set. The network security guide explains how the two control types interact.

Let the Path Analyzer do the work

OCI includes a Network Path Analyzer that evaluates the virtual path between a source and a destination and reports exactly where a flow would be allowed or blocked, including route tables, security rules and gateways. For most reachability problems this is the fastest single tool, because it inspects the configured path without you needing to log into anything. Run it early once you have classified the symptom as a reachability problem, and let it point you at the specific route table or rule at fault. The Path Analyzer guide covers how to read its output.

Performance problems are different

When the connection works but is slow, you are no longer troubleshooting reachability, you are troubleshooting performance, and the method changes. Separate the network from the application by measuring raw throughput and latency between hosts, then compare it to what the application sees. If the raw numbers are healthy the network is not your problem. If they are not, look at shape bandwidth, instance placement and load balancer sizing, in that order. The performance tuning guide walks through each lever.

Reading flow logs effectively

VCN flow logs are the closest thing OCI gives you to watching traffic pass, and they settle many arguments quickly. When a flow is failing and you cannot tell whether routing or a security rule is at fault, the logs tell you directly: if the traffic appears and is rejected, a rule is blocking it, and if the traffic never appears at all, it is not even reaching the point where rules would evaluate it, which points back at routing or at the source. Learning to read the accept and reject records, and which rule set produced them, turns a guessing game into a lookup. Enable the logs before you need them, because a problem you cannot reproduce is far harder to diagnose without a record of what happened the first time.

The logs are also where intermittent problems give themselves away. A flow that works most of the time but fails occasionally often reveals an asymmetric path or a load balanced backend that is misconfigured, and the pattern only becomes visible when you can see many flows over time rather than a single failed attempt. For anything intermittent, the logs are usually the fastest route to the cause.

The most common OCI network faults

Experience shows the same handful of causes behind most OCI network problems. A missing route on the source subnet, so traffic has nowhere to go. A missing inbound rule on the destination, so traffic arrives and is dropped. Overlapping address ranges, so two networks cannot be distinguished. A private resource that cannot reach an Oracle service because no service gateway route exists. A name that resolves to a public address when it should resolve to a private one, sending traffic down the wrong path entirely. And asymmetric routing, where outbound and return traffic take different paths and a stateful device drops the return. Knowing this short list means that when you classify a symptom, you already have a strong sense of where to look first.

None of these are subtle once you know to check for them, which is the whole point of a method. The faults that waste hours are not exotic. They waste hours because the investigator changed three things at once, never confirmed the basics, or assumed the application was at fault when a route table was the cause. A short, ordered checklist beats intuition almost every time.

Documenting and learning from incidents

Every network problem you solve is worth a short record: what the symptom was, where the break turned out to be, and how you found it. Over time this record becomes the fastest troubleshooting tool you have, because most estates fail in a small number of characteristic ways, and a team that has written down its last few incidents recognises the next one immediately. It also surfaces patterns worth fixing permanently, such as a class of change that keeps causing the same outage, which points at a process or a guardrail rather than a one off fix. Treating troubleshooting as something to learn from, rather than something to survive, is what turns a reactive network team into one that rarely gets surprised twice by the same thing.

Building the habit

The value of a method is that it works under pressure, when an outage is live and the temptation to guess is strongest. Classify the symptom, confirm DNS, trace the route both ways, check the rules both ways, run the Path Analyzer, confirm the service, and read the flow logs. Almost every OCI network problem falls out of that sequence quickly. For the wider context of how these components fit, see the complete OCI networking guide. When an estate needs the routing, security and observability set up so that problems are quick to find, our OCI networking solution covers the build and the run.

Free white paper

Go deeper on this topic with The OCI Landing Zone and Architecture Guide, a reference architecture for security, networking, and governance on OCI. An independent analyst style report with comparison tables and recommendations, free with a work email. Prefer a monthly summary instead? The OCI Brief delivers one practical OCI briefing a month.

Part of a series
This guide is part of OCI Networking — our complete pillar guide on the topic.

About the author

Morten Andersen, Co-founder of OCI Specialists — 20 years of enterprise IT experience in OCI migration, security, networking, and 24/7 operations. Full profile · LinkedIn

Moving Oracle workloads to OCI, or already running on OCI and not sure the architecture or the spend is right? Most teams bring in a specialist before they commit to a region, a shape, or a Universal Credits number. OCISpecialists.com plans the landing zone, runs the migration, and manages the estate after go live, on a fixed project fee, a managed monthly retainer, or a cost optimization fee paid only on verified savings.