Significant drop in RPS after migrating to Azure Application Gateway V2 – Potential MS backbone issue?
Hello Team,
We need assistance with issue after migrating our workloads from GCP to Azure.
Scenario
- We migrated from GCP where we consistently handled ~150K requests per second (RPS) via GCP load balancers.
- After migration to Azure, using Azure Application Gateway Standard V2 (manual scale, 125 instances), the observed RPS dropped to around 40K-50K RPS.
- Backend pool consists of direct VM IPs (no intermediate firewall or appliance).
- Frontend: Public IP
74.XX.xx>xx
What we observed
Backend health: All backend hosts report healthy consistently in Azure metrics.
Healthy Host Count: Avg = 6, Unhealthy Host Count = 0
Current connections: Peaked at ~1.2M
Capacity units: Averaged ~600, with 125 max instances
Failed Requests: Minimal — mainly HTTP 499 (client aborts)
Response Status: No large-scale 4xx/5xx error patterns
SSL validation: No issues (validated via sslshopper.com)
No significant throttling or errors in AGW logs (checked via KQL queries)
Investigations performed
- We checked Azure Monitor metrics: Current connections, total requests, failed requests, capacity units, healthy/unhealthy host count.
- We ran KQL queries in Log Analytics to confirm no throttling, backend connection issues, or internal 4xx/5xx patterns.
- We confirmed DNS resolution and SSL health.
- Verified Application Gateway is scaled to 125 instances (max for Standard V2).
- Backend VMs have adequate capacity; no CPU/memory bottlenecks.
Our concern
The requests simply don’t seem to be reaching the Application Gateway at expected volumes. Client side reports no visible hits to old GCP LB, so DNS caching unlikely.
We suspect a Microsoft backbone network or Azure front-door/routing issue may be limiting traffic before it reaches our Application Gateway.