Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
This article discusses how to do basic troubleshooting of outbound connections from a Microsoft Azure Kubernetes Service (AKS) cluster and identify faulty components.
Prerequisites
The Kubernetes kubectl tool, or a similar tool to connect to the cluster. To install kubectl by using Azure CLI, run the az aks install-cli command.
The apt-get command-line tool for handling packages.
The Client URL (cURL) tool, or a similar command-line tool.
The
nslookup
command-line (dnsutils) tool for checking DNS resolution.
Scenarios for outbound traffic in Azure Kubernetes Service
Traffic that originates from within the AKS cluster, whether it's from a pod or a worker node, is considered as outbound traffic from the cluster. If there's an issue in the outbound flow for an AKS cluster, before you troubleshoot, first look at the scenarios for outbound traffic flow.
The outbound traffic from an AKS cluster can be classified into the following categories:
Traffic to a pod or service in the same cluster (internal traffic).
Traffic to a network resource or endpoint in the same virtual network or a different virtual network that uses virtual network peering.
Traffic to an on-premises environment through a VPN connection or an Azure ExpressRoute connection.
Traffic outside the AKS network through Azure Load Balancer (public outbound traffic).
Traffic outside the AKS network through Azure Firewall or a proxy server (public outbound traffic).
Internal traffic
A basic request flow for internal traffic from an AKS cluster resembles the flow shown in the following diagram.
Public outbound traffic through Azure Load Balancer
If the traffic is for a destination on the internet, the default method is to send the traffic through the Azure Load Balancer.
Public outbound traffic through Azure Firewall or a proxy server
In some cases, the egress traffic has to be filtered, and it might require Azure Firewall.
A user might want to add a proxy server instead of a firewall, or set up a NAT gateway for egress traffic. The basic flow remains the same as shown in the diagram.
It's important to understand the nature of egress flow for your cluster so that you can continue troubleshooting.
Considerations when troubleshooting
Check your network resources within traffic flow
When you troubleshoot outbound traffic in AKS, it's important to know what network resources are present (that is, the hops through which the traffic passes). Here, the network resource could be one of the following components:
- Azure Load Balancer
- Azure Firewall or a custom firewall
- A network address translation (NAT) gateway
- A proxy server
- Network security group (NSG)
- Network policy
The flow could also differ based on the destination. For example, internal traffic (that is, within the cluster) doesn't go through the external network resources and only uses the cluster networking. For public outbound traffic, determine which network resources are implemented for your cluster.
Check outbound connectivity path and blockers with Azure Virtual Network Verifier (Preview)
To check where traffic is blocked within your network resources to specific endpoints (for example, mcr.microsoft.com
), you can use the Azure Virtual Network Verifier (Preview) tool. By running a connectivity analysis, you can visualize the hops within the traffic flow and any misconfigurations within Azure networking resources that are blocking traffic. We recommend using the Virtual Network Verifier tool as a first step in troubleshooting outbound connectivity issues to isolate the issue and detect problematic network configuration. For more instructions, check if Azure network resources are blocking traffic to the endpoint using Azure Virtual Network Verifier (Preview).
Manual troubleshooting
For manual troubleshooting, we recommend you check the following items:
The source and the destination for the request.
The hops in between the source and the destination.
The request-response flow.
The hops enhanced by extra security layers, such as the following layers:
- Firewall
- Network security group (NSG)
- Network policy
To identify a problematic hop, check the HTTP response codes before and after it. These codes are useful to identify the nature of the issue. The codes are especially helpful in scenarios in which the application responds to HTTP requests. To check whether the packets arrive properly in a specific hop, you can proceed with packet captures.
Take packet captures from the client and server
If other troubleshooting steps don't provide any conclusive outcome, take packet captures from the client and server. Packet captures are also useful when non-HTTP traffic is involved between the client and server. For more information about how to collect packet captures for AKS environment, see the following articles in the data collection guide:
Troubleshooting checklists
For basic troubleshooting for egress traffic from an AKS cluster, follow these steps:
Make sure that the Domain Name System (DNS) resolution for the endpoint works correctly.
Make sure that you can reach the endpoint through an IP address.
Make sure that you can reach the endpoint from another source.
Check whether the cluster can reach any other external endpoint.
Check whether a firewall or proxy is blocking the traffic.
Check whether the AKS service principal or managed identity has the required AKS service permissions to make the network changes to Azure resources.
Note
Assumes no service mesh when you do basic troubleshooting. If you use a service mesh such as Istio, it produces unusual outcomes for TCP based traffic.
Check if Azure network resources are blocking traffic to the endpoint
To determine if traffic is blocked to the endpoint due to Azure network resources, run a connectivity analysis from your AKS cluster nodes to the endpoint using the Azure Virtual Network Verifier (Preview) tool. The connectivity analysis covers the following resources:
- Azure Load Balancer
- Azure Firewall
- A network address translation (NAT) gateway
- Network security group (NSG)
- Network policy
- User defined routes (route tables)
- Virtual network peering
Note
Azure Virtual Network Verifier (Preview) can't access any external or third-party networking resources, such as a custom firewall. If the connectivity analysis doesn't detect any blocked traffic, we recommend that you perform a manual check of any external networking to cover all hops in the traffic flow.
Currently, clusters using Azure CNI Overlay aren't supported for this feature. Support for CNI Overlay is planned for August 2025.
- Navigate to your cluster in the Azure portal. In the sidebar, navigate to the Settings -> Node pools blade.
- Identify the nodepool you want to run a connectivity analysis from. Click on the nodepool to select it as the scope.
- Select "Connectivity analysis (Preview)" from the toolbar at the top of the page. If you don't see it, click on the three dots "..." in the toolbar at the top of the page to open the expanded menu.
- Select a Virtual Machine Scale Set (VMSS) instance as the source. The source IP addresses are populated automatically.
- Select a public domain name/endpoint as the destination for the analysis, one example is
mcr.microsoft.com
. The destination IP addresses are also populated automatically. - Run the analysis and wait up to 2 minutes for the results. In the resulting diagram, identify the associated Azure network resources and where traffic is blocked. To view the detailed analysis output, click on the "JSON output" tab or click into the arrows in the diagram.
Check that the Domain Name Service (DNS) resolution for the endpoint works correctly
You can run a DNS lookup to the endpoint by running a debugging pod on one of your AKS nodes. If the issue is isolated to a specific problematic pod or namespace, run the DNS lookup from within the same namespace where you notice the problem.
If you can't run the kubectl exec command to connect to an existing pod, you can start a test pod in the same namespace as the problematic pod to run the tests.
Note
If the DNS resolution or egress traffic doesn't let you install the necessary network packages, you can use the rishasi/ubuntu-netutil:1.0
docker image. In this image, the required packages are already installed.
Example procedure for checking DNS resolution
Start a test pod in the problematic namespace:
kubectl run -it --rm aks-ssh --namespace <namespace> --image=debian:stable --overrides='{"spec": { "nodeSelector": {"kubernetes.io/os": "linux"}}}'
After the test pod is running, you'll gain access to the pod.
Run the following
apt-get
commands to install other tool packages:# Update and install tool packages apt-get update && apt-get install -y dnsutils curl
After the packages are installed, run the nslookup command to test the DNS resolution to the endpoint:
$ nslookup microsoft.com # Microsoft.com is used as an example Server: <server> Address: <server IP address>#53 ... ... Name: microsoft.com Address: 20.53.203.50
Try the DNS resolution from the upstream DNS server directly. This example uses Azure DNS:
$ nslookup microsoft.com 168.63.129.16 Server: 168.63.129.16 Address: 168.63.129.16#53 ... ... Address: 20.81.111.85
Sometimes, there's a problem with the endpoint itself rather than a cluster DNS. In such cases, consider the following checks:
Check whether the desired port is open on the remote host:
curl -Ivm5 telnet://microsoft.com:443
Check the HTTP response code:
curl -Ivm5 https://microsoft.com
Check whether you can connect to any other endpoint:
curl -Ivm5 https://kubernetes.io
To verify that the endpoint is reachable and DNS is functioning from the node hosting the problematic pod, follow these steps:
Enter the node hosting the problematic pod using the debug pod. For more information, see Connect to Azure Kubernetes Service (AKS) cluster nodes for maintenance or troubleshooting.
Test the DNS resolution to the endpoint:
$ nslookup microsoft.com Server: 168.63.129.16 Address: 168.63.129.16#53 Non-authoritative answer: Name: microsoft.com Address: 20.112.52.29 Name: microsoft.com Address: 20.81.111.85 Name: microsoft.com Address: 20.84.181.62 Name: microsoft.com Address: 20.103.85.33 Name: microsoft.com Address: 20.53.203.50
Check the resolv.conf file to determine whether the expected name servers are added:
cat /etc/resolv.conf cat /run/systemd/resolve/resolv.conf
Example procedure for checking DNS resolution of a Windows pod
Run a test pod in the Windows node pool:
# For a Windows environment, use the Resolve-DnsName cmdlet. kubectl run dnsutil-win --image='mcr.microsoft.com/windows/servercore:ltsc2022' --overrides='{"spec": { "nodeSelector": {"kubernetes.io/os": "windows"}}}' -- powershell "Start-Sleep -s 3600"
Run the kubectl exec command to connect to the pod by using PowerShell:
kubectl exec -it dnsutil-win -- powershell
Run the Resolve-DnsName cmdlet in PowerShell to check whether the DNS resolution is working for the endpoint:
PS C:\> Resolve-DnsName www.microsoft.com Name Type TTL Section NameHost ---- ---- --- ------- -------- www.microsoft.com CNAME 20 Answer www.microsoft.com-c-3.edgekey.net www.microsoft.com-c-3.edgekey. CNAME 20 Answer www.microsoft.com-c-3.edgekey.net.globalredir.akadns.net net www.microsoft.com-c-3.edgekey. CNAME 20 Answer e13678.dscb.akamaiedge.net net.globalredir.akadns.net Name : e13678.dscb.akamaiedge.net QueryType : AAAA TTL : 20 Section : Answer IP6Address : 2600:1408:c400:484::356e Name : e13678.dscb.akamaiedge.net QueryType : AAAA TTL : 20 Section : Answer IP6Address : 2600:1408:c400:496::356e Name : e13678.dscb.akamaiedge.net QueryType : A TTL : 12 Section : Answer IP4Address : 23.200.197.152
In one unusual scenario that involves DNS resolution, the DNS queries get a correct response from the node but fail from the pod. For this scenario, you might consider checking DNS resolution failures from inside the pod but not from the worker node. If you want to inspect DNS resolution for an endpoint across the cluster, you can consider checking DNS resolution status across the cluster.
If the DNS resolution is successful, continue to the network tests. Otherwise, verify the DNS configuration for the cluster.
Third-party contact disclaimer
Microsoft provides third-party contact information to help you find additional information about this topic. This contact information may change without notice. Microsoft does not guarantee the accuracy of third-party contact information.
Contact us for help
If you have questions or need help, create a support request, or ask Azure community support. You can also submit product feedback to Azure feedback community.