Most networking issues in Azure are actually DNS issues.

Here is why DNS is hard in Azure, and how we handle it.

Public and Private DNS Do Not Play Well Together

Azure resources have public endpoints by default.

They resolve via public DNS.

When you add a private endpoint, the resource gets a private IP.

Now you have two IPs for the same resource:

  • public IP (internet-facing)
  • private IP (VNET-only)

DNS must resolve to the private IP from within your VNET, and the public IP from outside.

That split-brain DNS setup causes most of our problems.

Private DNS Zones Are Required for Private Endpoints

When you create a private endpoint for a Storage Account, Key Vault, or SQL Database, Azure does not automatically update DNS.

You have to:

  • create a private DNS zone (e.g., privatelink.blob.core.windows.net)
  • link the zone to your VNET
  • create an A record pointing to the private endpoint IP

If you skip any of these steps, DNS resolves to the public IP.

Your app tries to connect over the internet. The firewall blocks it.

You get connection timeouts.

It looks like a networking issue. It is DNS.

Private DNS zones must be linked to every VNET that needs to resolve private IPs.

We had:

  • hub VNET
  • spoke VNETs for apps
  • private DNS zones in the hub

We linked the zones to the hub VNET. We forgot to link them to the spokes.

Apps in the spoke VNETs could not resolve private endpoints.

They timed out. We spent hours checking NSG rules.

The fix was adding VNET links to the private DNS zones.

Two-minute fix. Four-hour troubleshooting session.

Conditional Forwarding Is Complex

If you have on-premises DNS servers, they need to forward Azure queries to Azure DNS.

Azure DNS resolves at 168.63.129.16.

You configure conditional forwarders to send:

  • *.privatelink.blob.core.windows.net → 168.63.129.16
  • *.privatelink.vault.azure.net → 168.63.129.16
  • and so on

If the forwarding is not configured, on-premises clients cannot resolve private endpoints.

We had VPN-connected users who could not access Azure resources.

Networking was fine. DNS was broken.

DNS Caching Hides Changes

You update a DNS record. You test. It still resolves to the old IP.

DNS caching happens at multiple layers:

  • the OS
  • the application
  • intermediate DNS servers

We changed a private endpoint IP. Apps still connected to the old IP for 30 minutes.

The TTL on the DNS record was set to 3600 seconds (1 hour).

We now use shorter TTLs for records we might change (300 seconds).

Azure DNS Has Limits You Do Not Expect

Azure private DNS zones have limits:

  • 25,000 record sets per zone
  • 1,000 VNET links per zone
  • 500 zones per subscription

Public Azure services use DNS names like:

  • mystorageaccount.blob.core.windows.net
  • mykeyvault.vault.azure.net

Private endpoints use a subdomain:

  • mystorageaccount.privatelink.blob.core.windows.net
  • mykeyvault.privatelink.vault.azure.net

The public DNS name CNAMEs to the privatelink subdomain.

The privatelink subdomain resolves via your private DNS zone.

If your private DNS zone is not configured, the CNAME chain breaks.

You get “Name not resolved.”

We debugged this five times before we understood the flow.

Resolution Stops Working After VNET Peering

You peer two VNETs.

Apps in VNET A can reach apps in VNET B.

But DNS does not resolve across peered VNETs automatically.

If VNET A has a private DNS zone, apps in VNET B cannot use it unless:

  • the zone is linked to VNET B
  • or VNET B uses custom DNS servers that forward to VNET A

We had apps that could ping private IPs directly but could not resolve hostnames.

The connectivity was fine. DNS was not configured.

How We Manage DNS Now

We built a standard:

  1. All private DNS zones are in a central “connectivity” subscription.
  2. All VNETs link to the central zones.
  3. We use automation (Terraform) to link new VNETs automatically.
  4. We use short TTLs (300 seconds) for private endpoint records.
  5. We document every private DNS zone and its purpose.
  6. We test DNS resolution from every VNET during setup.

This eliminated 80% of our DNS issues.

Our Troubleshooting Process

When we see connection failures, we check DNS first:

  1. Does the hostname resolve?
    nslookup myresource.blob.core.windows.net
    
  2. Does it resolve to a private IP or public IP?
  3. Is the private DNS zone created?
  4. Is the zone linked to the VNET?
  5. Does the A record exist in the zone?
  6. Is the TTL causing cache issues?

We fix DNS before we check networking.

Because 90% of the time, it is DNS.

The Full DNS Diagnostic Script

This is what we run before we check anything else:

#!/bin/bash
# diagnose-dns.sh
# Usage: ./diagnose-dns.sh mystorageaccount.blob.core.windows.net

HOSTNAME=$1

echo "=== 1. Does the hostname resolve? ==="
nslookup "$HOSTNAME"

echo ""
echo "=== 2. What does it resolve to? (public or private IP) ==="
RESOLVED_IP=$(dig +short "$HOSTNAME" | tail -1)
echo "Resolved IP: $RESOLVED_IP"

# Check if it's an RFC1918 private IP
if [[ "$RESOLVED_IP" =~ ^10\. ]] || [[ "$RESOLVED_IP" =~ ^172\.(1[6-9]|2[0-9]|3[01])\. ]] || [[ "$RESOLVED_IP" =~ ^192\.168\. ]]; then
  echo "-> Private IP (good, hitting private endpoint)"
else
  echo "-> Public IP (DNS may not be configured for private endpoint)"
fi

echo ""
echo "=== 3. Check the CNAME chain ==="
dig +short CNAME "$HOSTNAME"

echo ""
echo "=== 4. Check TTL ==="
dig "$HOSTNAME" | grep -E "^$HOSTNAME" | awk '{print "TTL:", $2, "seconds"}'

In Terraform, we automate private DNS zone creation and VNET linking so no one can forget:

locals {
  private_dns_zones = [
    "privatelink.blob.core.windows.net",
    "privatelink.vault.azure.net",
    "privatelink.database.windows.net",
    "privatelink.azurecr.io",
  ]
}

resource "azurerm_private_dns_zone" "this" {
  for_each            = toset(local.private_dns_zones)
  name                = each.value
  resource_group_name = azurerm_resource_group.connectivity.name
}

# Link every zone to every VNET automatically
resource "azurerm_private_dns_zone_virtual_network_link" "this" {
  for_each = {
    for pair in setproduct(local.private_dns_zones, keys(var.vnets)) :
    "${pair[0]}-${pair[1]}" => { zone = pair[0], vnet_key = pair[1] }
  }

  name                  = "link-${each.value.vnet_key}"
  resource_group_name   = azurerm_resource_group.connectivity.name
  private_dns_zone_name = azurerm_private_dns_zone.this[each.value.zone].name
  virtual_network_id    = var.vnets[each.value.vnet_key].id
}

This is the thing that saved us four-hour troubleshooting sessions. If the zone exists and the VNET link exists, DNS just works.

The Lesson

DNS in Azure is not automatic.

Private endpoints require manual DNS configuration.

VNET links are easy to forget.

Caching hides problems.

Split-brain DNS is complex.

We learned to treat DNS as critical infrastructure, not an afterthought.

Now we configure DNS first, then deploy resources.

That is the right order.

And it saves hours of troubleshooting.

Related posts: