APIM/Express Route Latency Spike

Matthew Leon 0 Reputation points
2025-07-21T17:22:44.9+00:00

Good Afternoon,

Yesterday we experinced an issue with APIM/Express Route exprienced about an hour long latency spike that caused requests to time out. I want to identify whether this was an Azure issue or an issue w/ our partner.

Summary of the Incident

  • Between 8:55 PM and 9:56 PM EST, the system experienced timeouts while forwarding payloads to multiple Telus WSP endpoints.
  • Failures appeared as asyncio.TimeoutError and HTTP 408 errors.
  • Affected traffic was routed primarily to https://cls-b01.telus.com/collection/v1/els (131 failures).
  • Other endpoints (e.g., cls-h03cls-h01cls-l03lrf1freedomobile, etc.) saw 1__–6 failures__ each.
  • Spike in APIM backend request duration aligned with error window.
  • No anomalies found in pod health, CPU usage, or internal infra.
  • Incident Created (SEV3) 9:59 PM --> "Identified" 9:59PM ----> “Observing” 10:12 PM (Last Observed Failure 9:56 PM)----->   Marked as "Resolved" 10:32 PM  
Azure Application Gateway
Azure Application Gateway
An Azure service that provides a platform-managed, scalable, and highly available application delivery controller as a service.
{count} votes

1 answer

Sort by: Most helpful
  1. G Sree Vidya 4,005 Reputation points Microsoft External Staff Moderator
    2025-07-21T19:16:45.66+00:00

    Hello Matthew Leon

    We understood that you are encountering some significant latency issues with Azure API Management (APIM) and ExpressRoute, which is impacting your requests to Telus WSP endpoints.

    We request you to please check below details on Azure side:

    1.Please confirm if you have configured the monitor logs on ER metrics:

    Check the following metrics for the ExpressRoute circuit during the incident window:

    • Ingress/Egress Throughput: Look for spikes or drops in data transfer.
    • BGP Availability: Ensure BGP sessions were stable and not flapping.
    • Circuit Utilization: High utilization may indicate congestion.

    2.Use Connection Monitor to test and log connectivity between APIM and Telus endpoints.

    Refer: https://learn.microsoft.com/en-us/azure/expressroute/how-to-configure-connection-monitor

    https://learn.microsoft.com/en-us/azure/expressroute/monitor-expressroute

    3.Contact Telus to confirm if they had any service degradation during that time.

    If the failures were concentrated on Telus endpoints, suggesting the issue may have been on the partner’s side (e.g., Telus WSP).

    The fact that other endpoints had minimal failures supports the idea that the issue was not systemic within Azure.


    I hope this helps! If these answers your query, do click the "Upvote" and click "Accept the answer" of which might be beneficial to other community members reading this thread.

    If the above is unclear or you are unsure about something, please add a comment below.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.