Significant drop in RPS after migrating to Azure Application Gateway V2 – Potential MS backbone issue?

Question

Significant drop in RPS after migrating to Azure Application Gateway V2 – Potential MS backbone issue?

Niket Kumar Singh 785

Hello Team,

We need assistance with issue after migrating our workloads from GCP to Azure.

Scenario

We migrated from GCP where we consistently handled ~150K requests per second (RPS) via GCP load balancers.
After migration to Azure, using Azure Application Gateway Standard V2 (manual scale, 125 instances), the observed RPS dropped to around 40K-50K RPS.
Backend pool consists of direct VM IPs (no intermediate firewall or appliance).
Frontend: Public IP 74.XX.xx>xx

What we observed

Backend health: All backend hosts report healthy consistently in Azure metrics.
Healthy Host Count: Avg = 6, Unhealthy Host Count = 0
Current connections: Peaked at ~1.2M
Capacity units: Averaged ~600, with 125 max instances

Failed Requests: Minimal — mainly HTTP 499 (client aborts)

Response Status: No large-scale 4xx/5xx error patterns

SSL validation: No issues (validated via sslshopper.com)
No significant throttling or errors in AGW logs (checked via KQL queries)

Investigations performed

We checked Azure Monitor metrics: Current connections, total requests, failed requests, capacity units, healthy/unhealthy host count.
We ran KQL queries in Log Analytics to confirm no throttling, backend connection issues, or internal 4xx/5xx patterns.
We confirmed DNS resolution and SSL health.
Verified Application Gateway is scaled to 125 instances (max for Standard V2).
Backend VMs have adequate capacity; no CPU/memory bottlenecks.

Our concern

The requests simply don’t seem to be reaching the Application Gateway at expected volumes. Client side reports no visible hits to old GCP LB, so DNS caching unlikely.
We suspect a Microsoft backbone network or Azure front-door/routing issue may be limiting traffic before it reaches our Application Gateway.

Anonymous

2025-06-23T03:12:35.1866667+00:00
Hi @Niket Kumar Singh

It sounds like you’re facing quite a challenge after migrating your workloads from GCP to Azure using the Application Gateway Standard V2. Dropping from ~150K RPS to around 40K-50K RPS is definitely concerning. Here are some steps and checks you can take or consider:

SNAT Port Limitations: The SNAT port limits can significantly affect the number of concurrent connections to your backend. Make sure you're not hitting this limit:

If using public IPs for the backend, each requires a separate SNAT port.

To mitigate this, you can increase the number of Application Gateway instances, scale out your backends (more IPs), or consider moving backends into the same virtual network using private IPs.

If Application Gateway reaches the SNAT port limit, it affects the requests per second (RPS). For example, Application Gateway can't open a new connection to the back end, and the request fails.

Reference: Architecture best practices for Azure Application Gateway v2

How many backend VMs are actually configured in your Application Gateway backend pool? If you have, say, 50 VMs, but the healthy host count consistently shows 6, then only 6 VMs are truly being utilized or perceived as healthy for routing, which would drastically limit your RPS even if the AGW itself has capacity.

Is your backend pool configured correctly to include all intended VMs? Double-check the backend pool settings in the Azure portal or via PowerShell/CLI.

Are there any NSGs or UDRs on the backend VMs themselves that might be preventing the Application Gateway from establishing connections to all of them, even if the health probes are succeeding?

Backend Pool HTTP Settings:

Keep-Alive: Ensure HTTP Keep-Alive is enabled and configured appropriately in your backend application and Application Gateway HTTP settings. This reduces the overhead of establishing new connections for every request.

Request Timeout: Check the "Request timeout (seconds)" setting in your HTTP settings. If it's too low, it could lead to 499 errors if the backend is slow. If it's too high, it could keep connections open unnecessarily.

Host Header: Is the host header being correctly passed to the backend? Misconfigurations here can lead to issues.

Application Gateway Capacity: Make sure that the capacity is optimized. You mentioned 125 instances, but it’s worth scaling beyond this if you observe high loads or consider using larger instance sizes if applicable.

Traffic Distribution: Check if the traffic distribution is properly configured. Make sure that your DNS records are updated and not still pointing to the GCP load balancer. While you think DNS caching isn't likely, make sure that any clients using static DNS data refresh appropriately.

Investigate Front Door and Routing: Since you suspect an Azure network or routing issue, validate:

Azure's service health to see if there are ongoing issues.

If you are using Azure Front Door or have any specific routing rules that might be inhibiting traffic flow to your Application Gateway.

Monitor Logging and Analytics: Continue to leverage Azure Monitor and KQL queries to gain insights. Pay particular attention to network-related logs that might provide hints of packet drops or connectivity issues.

Azure Network Watcher:

Connection Troubleshoot: Use Azure Network Watcher's Connection Troubleshoot to test connectivity from a source VM (e.g., a test VM in a different VNet, or even from one of your backend VMs to the Application Gateway's frontend IP, if it allows it for diagnostic purposes).

IP Flow Verify: Verify if traffic is allowed to and from the Application Gateway's public IP.

Next Hop: Use Next Hop to see the next hop for traffic destined to your AGW's public IP from a test VM.

Action: Set up Network Watcher in your region and use its diagnostic tools.

Kindly let us know if the above helps or if you need any further assistance on the issue.
Niket Kumar Singh 785 Reputation points

2025-06-23T05:31:04.39+00:00

Hi Sai Prasanna Sinde

I’d like to provide additional information based on our setup and investigations so far:

Backend Configuration: We have verified that all backend pool targets are using private IP addresses in the 10.x.x.x range So, SNAT port limitations should not be a factor in our case.

We are using manual scaling, and our Application Gateway Standard V2 is configured at the maximum instance count of 125.

Despite having 125 instances and a large backend pool, we are seeing a drastic drop in RPS — from ~150K RPS on GCP to ~40K-50K RPS on Azure.

We have checked and confirmed DNS resolution, backend health probes, request/connection metrics, and no hits are being seen on the old GCP load balancer, so DNS caching or client misrouting is unlikely.

Could you please guide us on what additional checks we can perform? Does this scenario point towards any known limitations or potential issues on the Azure network side that could explain this behavior?
ChaitanyaNaykodi-MSFT 27,496 Reputation points Microsoft Employee Moderator

2025-06-23T17:21:20.3566667+00:00

@Niket Kumar Singh

Thank you for reaching out.

Based on your question above as RPS dropped 40K-50K I wonder if you are hitting the limitation documented here for Application Gateway V2 : Application Gateway V2 only allows 62500 Max. connections per second (Estimated based on using an RSA 2048-bit key TLS certificate.)

Does the 150K requests represents new connections to the Application Gateway?
If your traffic requirement needs more than 125 instances, you can use Azure Traffic Manager or Azure Front Door in front of your Application Gateway. For more information, please see Connect Azure Front Door Premium to an Azure Application Gateway with Private Link and **Use Azure App Gateway with Azure Traffic Manager.
You can also refer this Load-balancing options guide for best practices in Azure https://learn.microsoft.com/en-us/azure/architecture/guide/technology-choices/load-balancing-overview

Please let me know if you have any questions or concerns. Thank you!
Niket Kumar Singh 785 Reputation points

2025-06-23T17:37:51.1433333+00:00
Hi ChaitanyaNaykodi-MSFT

Thank you for your detailed response and the guidance provided.

We would like to seek further clarification on the Application Gateway V2 limitation you mentioned regarding 62500 max connections per second (based on RSA 2048-bit TLS certificate).

Our specific question: Does this limit of ~62.5K connections per second apply per Application Gateway deployment regardless of how many instances we provision (e.g., 125 instances)? In other words, even though we have scaled our Application Gateway Standard V2 to the maximum instance count (125 instances in our case), does the total connection handling capacity remain capped at 62.5K new connections per second for the entire deployment?

If so, could you confirm if the only way to handle beyond this limit would be to:

Deploy multiple Application Gateways behind Azure Front Door or Traffic Manager

Or consider alternative architectures suitable for this scale

We would like to confirm the maximum concurrent connections that can be handled:

Per instance

Per deployment

As per our understanding (based on documentation and Capacity Unit description):

Each compute unit supports approximately a single Capacity Unit consists of the following parameters:

2500 Persistent connections, 2.22-Mbps throughput, 1 Compute Unit

Since we have manually scaled our Application Gateway Standard V2 to 125 instances, we assume:

Max concurrent connections = 2,500 × 125 = 312,500 persistent connections

Could you please validate if this interpretation is correct

We appreciate your assistance in helping us understand this limit clearly so we can plan our next steps accordingly.
ChaitanyaNaykodi-MSFT 27,496 Reputation points Microsoft Employee Moderator

2025-06-27T04:22:41.9133333+00:00
@Niket Kumar Singh

Thank you for getting back and apologies for the delay.>

does the total connection handling capacity remain capped at 62.5K new connections per second for the entire deployment?

Yes, this is correct.
Connections per second per compute unit is 50 for Standard V2. One application gateway instance can handle a minimum of 10 compute units (Currently documented here). For 125 instances
125x10x50 = 62500 Max. connections per second.

If so, could you confirm if the only way to handle beyond this limit would be to:

Deploy multiple Application Gateways behind Azure Front Door or Traffic Manager

Or consider alternative architectures suitable for this scale

Yes, I think using Front Door Or Traffic manager will be recommended. For Azure Front Door Maximum requests per second per profile is 100,000 although you can increase this limit by creating a quota support request. You can find more details here (There's currently a 5,000 requests per second per POP limit for each Front Door profile. Beyond this limit, the POP location will drop connections. If requests are concentrated in one of more regions and exceed this limit, you can request a higher POP limit by submitting an Azure support request.)

Max concurrent connections = 2,500 × 125 = 312,500 persistent connections Could you please validate if this interpretation is correct

This will be higher as one application gateway instance can handle a minimum of 10 compute units.
This value will 2500 x 10 x 125 = 312,5000 Persistent new connections for 125 instances.

Apologies for the delay once again and hope this helps. Please let me know if you have any questions. Thank you!
Niket Kumar Singh 785 Reputation points

2025-06-27T06:16:31.6+00:00
Hi ChaitanyaNaykodi-MSFT

In response to our support ticket [Tracking ID: 2506200030007582], Microsoft support confirmed the following:

62,500 new TLS connections/sec is a per-instance benchmark, not a hard limit.

Thus, for a 125-instance deployment, the theoretical maximum new TLS connections/sec = 125 × 62,500 = 7,812,500.

This implies that the TLS connection handling capacity scales linearly with instance count.Application Gateway High Traffic Support – Microsoft Learn

We would greatly appreciate your help in reconciling this contradiction:

Does the 62,500 new TLS connections/sec limit apply per instance, or is it an overall cap for the full Application Gateway V2 deployment (irrespective of instance count)?

If the limit is per deployment, what is the technical constraint that prevents scaling new TLS connections/sec beyond 62.5K, even with 125 instances?

If the limit is per instance, can we confidently assume linear scaling up to 125 instances = 7.8 million connections/sec (theoretically)?

Lastly, does the "50 new connections/sec per CU" serve as a benchmark for sizing, or is it an enforced platform-level cap?
Niket Kumar Singh 785 Reputation points

2025-06-27T06:29:28.5733333+00:00

Hi ChaitanyaNaykodi-MSFT
Support case id: 2506200030007582 || 250626003000380
We are experiencing persistent HTTP 499 errors (Client Closed Request) across multiple Azure Application Gateway Standard v2 instances, even though our backends and TCP connectivity seem healthy. We request clarification and deep insights into how App Gateway handles TCP reuse, timeouts, and client-to-backend behavior, especially in comparison with GCP’s Layer 7 Load Balancers.

Setup Overview: We are using 3 App Gateway deployments with manual scaling set to 125 instances each: Two in East US One in Central India No WAF is enabled. Our domain is routed via Akamai GTM, directing requests geographically to these gateways. The backend hosts are Linux VMs running Nginx.

App Gateway logs show many 499 errors, especially from client IPs like 20.xx.xx.xx

Same client IP sends both: Successful requests (HTTP 204 with 600ms+ response) Failed requests (HTTP 499 in ~500ms)

Internal testing showed: Low TCP latency from backend VMs to AppGW IP (via tcpping) Inconsistent latency when hitting NGINX endpoint via curl Latency spikes of ~500ms only on prod backends Non-prod VMs show consistent 3–5 ms response times

PCAP analysis from Microsoft AppGW backend team confirms: Client closes connection (via TCP FIN) before backend responds.

Backend slowness potentially causing early termination from client perspective.

script-output : The script performs repeated HTTP requests (via curl) to the backend URL , collecting metrics like:

DNS lookup time Connection establishment time Time to start receiving data Total time to complete the request

Key Observations from Output: The HTTP status is consistently 204 (No Content), confirming successful reachability. The latency values vary, but there is frequent increase in Start Transfer and Total times indicating backend processing delays. [2025-06-26 11:53:03] Response 3: HTTP_STATUS:204 LATENCY_MS:0.563388 time_starttransfer and time_total are often above 0.4–0.6 seconds, which is considerably high for health check endpoints.

DNS and TCP connection times remain very low (< 0.005s), ruling out DNS or initial connection setup issues.

In GCP, the same architecture worked flawlessly: Akamai → GCP L7 Load Balancer → NGINX VMs No 499 issues observed. Presumably due to connection reuse, less aggressive idle timeouts, or better backend affinity management in GCP’s L7 LB.

Kindly advise if:There’s any known limitation or behavior mismatch between AppGW and GCP L7 LB that may impact long-running or slightly latent backend responses. You can provide insights from the AppGW product group on these persistent 499 errors
ChaitanyaNaykodi-MSFT 27,496 Reputation points Microsoft Employee Moderator

2025-07-01T21:41:13.23+00:00

@Niket Kumar Singh
Thank you for sharing the details here.I have reached out to the support engineer to get some additional context here and will work with them for required clarification. Meanwhile I am also monitoring the support ticket resolution on my end.

Your answer

Niket Kumar Singh 785 Reputation points

2025-06-23T05:31:04.39+00:00

Hi Sai Prasanna Sinde

I’d like to provide additional information based on our setup and investigations so far:

Backend Configuration: We have verified that all backend pool targets are using private IP addresses in the 10.x.x.x range So, SNAT port limitations should not be a factor in our case.

We are using manual scaling, and our Application Gateway Standard V2 is configured at the maximum instance count of 125.

Despite having 125 instances and a large backend pool, we are seeing a drastic drop in RPS — from ~150K RPS on GCP to ~40K-50K RPS on Azure.

We have checked and confirmed DNS resolution, backend health probes, request/connection metrics, and no hits are being seen on the old GCP load balancer, so DNS caching or client misrouting is unlikely.

Could you please guide us on what additional checks we can perform? Does this scenario point towards any known limitations or potential issues on the Azure network side that could explain this behavior?
ChaitanyaNaykodi-MSFT 27,496 Reputation points Microsoft Employee Moderator

2025-06-23T17:21:20.3566667+00:00

@Niket Kumar Singh

Thank you for reaching out.

Based on your question above as RPS dropped 40K-50K I wonder if you are hitting the limitation documented here for Application Gateway V2 : Application Gateway V2 only allows 62500 Max. connections per second (Estimated based on using an RSA 2048-bit key TLS certificate.)

Does the 150K requests represents new connections to the Application Gateway?
If your traffic requirement needs more than 125 instances, you can use Azure Traffic Manager or Azure Front Door in front of your Application Gateway. For more information, please see Connect Azure Front Door Premium to an Azure Application Gateway with Private Link and **Use Azure App Gateway with Azure Traffic Manager.
You can also refer this Load-balancing options guide for best practices in Azure https://learn.microsoft.com/en-us/azure/architecture/guide/technology-choices/load-balancing-overview

Please let me know if you have any questions or concerns. Thank you!
Niket Kumar Singh 785 Reputation points

2025-06-23T17:37:51.1433333+00:00

Hi ChaitanyaNaykodi-MSFT

Thank you for your detailed response and the guidance provided.

We would like to seek further clarification on the Application Gateway V2 limitation you mentioned regarding 62500 max connections per second (based on RSA 2048-bit TLS certificate).

Our specific question: Does this limit of ~62.5K connections per second apply per Application Gateway deployment regardless of how many instances we provision (e.g., 125 instances)? In other words, even though we have scaled our Application Gateway Standard V2 to the maximum instance count (125 instances in our case), does the total connection handling capacity remain capped at 62.5K new connections per second for the entire deployment?

If so, could you confirm if the only way to handle beyond this limit would be to:

Deploy multiple Application Gateways behind Azure Front Door or Traffic Manager

Or consider alternative architectures suitable for this scale

We would like to confirm the maximum concurrent connections that can be handled:

Per instance

Per deployment

As per our understanding (based on documentation and Capacity Unit description):

Each compute unit supports approximately a single Capacity Unit consists of the following parameters:

2500 Persistent connections, 2.22-Mbps throughput, 1 Compute Unit

Since we have manually scaled our Application Gateway Standard V2 to 125 instances, we assume:

Max concurrent connections = 2,500 × 125 = 312,500 persistent connections

Could you please validate if this interpretation is correct

We appreciate your assistance in helping us understand this limit clearly so we can plan our next steps accordingly.
ChaitanyaNaykodi-MSFT 27,496 Reputation points Microsoft Employee Moderator

2025-06-27T04:22:41.9133333+00:00

@Niket Kumar Singh

Thank you for getting back and apologies for the delay.>

does the total connection handling capacity remain capped at 62.5K new connections per second for the entire deployment?

Yes, this is correct.
Connections per second per compute unit is 50 for Standard V2. One application gateway instance can handle a minimum of 10 compute units (Currently documented here). For 125 instances
125x10x50 = 62500 Max. connections per second.

If so, could you confirm if the only way to handle beyond this limit would be to:

Deploy multiple Application Gateways behind Azure Front Door or Traffic Manager

Or consider alternative architectures suitable for this scale

Yes, I think using Front Door Or Traffic manager will be recommended. For Azure Front Door Maximum requests per second per profile is 100,000 although you can increase this limit by creating a quota support request. You can find more details here (There's currently a 5,000 requests per second per POP limit for each Front Door profile. Beyond this limit, the POP location will drop connections. If requests are concentrated in one of more regions and exceed this limit, you can request a higher POP limit by submitting an Azure support request.)

Max concurrent connections = 2,500 × 125 = 312,500 persistent connections Could you please validate if this interpretation is correct

This will be higher as one application gateway instance can handle a minimum of 10 compute units.
This value will 2500 x 10 x 125 = 312,5000 Persistent new connections for 125 instances.

Apologies for the delay once again and hope this helps. Please let me know if you have any questions. Thank you!
Niket Kumar Singh 785 Reputation points

2025-06-27T06:16:31.6+00:00

Hi ChaitanyaNaykodi-MSFT

In response to our support ticket [Tracking ID: 2506200030007582], Microsoft support confirmed the following:

62,500 new TLS connections/sec is a per-instance benchmark, not a hard limit.

Thus, for a 125-instance deployment, the theoretical maximum new TLS connections/sec = 125 × 62,500 = 7,812,500.

This implies that the TLS connection handling capacity scales linearly with instance count.Application Gateway High Traffic Support – Microsoft Learn

We would greatly appreciate your help in reconciling this contradiction:

Does the 62,500 new TLS connections/sec limit apply per instance, or is it an overall cap for the full Application Gateway V2 deployment (irrespective of instance count)?

If the limit is per deployment, what is the technical constraint that prevents scaling new TLS connections/sec beyond 62.5K, even with 125 instances?

If the limit is per instance, can we confidently assume linear scaling up to 125 instances = 7.8 million connections/sec (theoretically)?

Lastly, does the "50 new connections/sec per CU" serve as a benchmark for sizing, or is it an enforced platform-level cap?
ChaitanyaNaykodi-MSFT 27,496 Reputation points Microsoft Employee Moderator

2025-07-01T21:41:13.23+00:00

@Niket Kumar Singh
Thank you for sharing the details here.I have reached out to the support engineer to get some additional context here and will work with them for required clarification. Meanwhile I am also monitoring the support ticket resolution on my end.

Share via

Significant drop in RPS after migrating to Azure Application Gateway V2 – Potential MS backbone issue?

Your answer