This is likely due to health probes still marking the primary as "Healthy", even though it was not serving traffic properly. Note that Azure Traffic Manager does not use application metrics or smart service-level checks — it only uses basic health probes (HTTP, HTTPS, or TCP). If those health probes receive a valid response (e.g., HTTP 200), then Traffic Manager will continue sending traffic to that endpoint, even if the app is malfunctioning internally.
So unless the service is completely unresponsive to the probe, Traffic Manager assumes it's healthy.
To address this, you'd need to use a realistic health checks. E.g. configure a dedicated /health
or /status
endpoint that accurately reflects the true health of the application. This endpoint should return:
- HTTP 200 only if the application is fully operational.
- HTTP 500 or timeout if backend dependencies fail or the app is not functioning.
To configure this, go to your Traffic Manager Profile → Endpoints → Select Primary → Update Custom Path in Monitoring settings (e.g., /healthz
).
To validate, simulate real failures, not just app crashes. Stopping or breaking the app service may not make the probe fail (e.g., if the web server still runs). Instead of just stopping app logic, configure your app to return 503 or error codes on health check endpoint during failures.
If the above response helps answer your question, remember to "Accept Answer" so that others in the community facing similar issues can easily find the solution. Your contribution is highly appreciated.
hth
Marcin