Notifications Hub Completely Down

Jason Brooks 10 Reputation points
2025-06-03T12:33:03.5433333+00:00

Starting early this morning, we can't seem to make any requests to an Azure Notifications Hub. Any attempt to hit an endpoint results in a timeout.

I went to check the access policies tab in the Notifications Hub namespace to make sure there wasn't an issue with the connection string, but that entire tab returns "No results" under policies. Thinking someone may have accidentally deleted our connection policy, I went to create a new policy, but after creating it and refreshing, the tab still says "No results".

None of our customers are able to register new push devices or send notifications. Is this entire service currently down? The status page isn't indicating any issues.

Screenshot 2025-06-03 at 8.34.24 AM

Azure Notification Hubs
Azure Notification Hubs
An Azure service that is used to send push notifications to all major platforms from the cloud or on-premises environments.
{count} votes

1 answer

Sort by: Most helpful
  1. TP 131.6K Reputation points Volunteer Moderator
    2025-06-04T05:47:32.21+00:00

    Hi @Jason Brooks ,

    Please confirm that your notification hubs are working okay now. Yesterday I tested mine in East US and it seems to be working fine. Below is latest update as of this writing:

    What happened?

    Between 06:52 UTC and 21:45 UTC on 03 June 2025, a platform issue impacted underlying service instances impacting Notification Hubs in the East US region. Customers experienced errors when sending notifications to recipients hosted in this region. Notifications or registrations attempted during the impact window will not be recovered, as the compute layer responsible for storing them was unavailable.

    What do we know so far?

    We identified that the service instances responsible for processing requests had become unhealthy. This unavailability of the compute layer prevented customer notifications from being processed correctly, resulting in the service disruption described above.

    How did we respond?

    • 06:00 UTC on 03 June 2025 – We received an alert via internal service telemetry indicating Notification Hub availability degradation in the East US region.
    • 06:52 UTC on 03 June 2025 – Customer impact began.
    • 07:00 UTC on 03 June 2025 – We identified an Active cluster as not being able to serve the traffic in the East US region.
    • 08:30 UTC on 03 June 2025 – Our team's investigation found that the issue was with service instances on unhealthy cluster nodes. We engaged other internal teams to work on mitigating efforts to get the cluster running again.
    • 13:00 UTC on 03 June 2025 – After further investigation, a parallel mitigation workstream was begun to build a new cluster. Work also continued to recover the previous cluster.
    • 20:41 UTC on 03 June 2025 – The parallel cluster was deployed and transitioned to an active status to handle Notifications and traffic in East US. Customers should have experienced increasing success from this point. Teams monitored closely to validate complete mitigation.
    • 21:45 UTC on 03 June 2025 – After a period of observation and confirmation from customers, we confirmed that Notification Hubs service requests had returned to pre-incident levels and customer impact was mitigated.

    What happens next?

    • Our team will be completing an internal retrospective to understand the incident in more detail. We will publish a Preliminary Post Incident Review (PIR) within approximately 72 hours, to share more details on what happened and how we responded. After our internal retrospective is completed, generally within 14 days, we will publish a Final Post Incident Review with any additional details and learnings.
    • To get notified when that happens, and/or to stay informed about future Azure service issues, make sure that you configure and maintain Azure Service Health alerts – these can trigger emails, SMS, push notifications, webhooks, and more: https://aka.ms/ash-alerts
    • For more information on Post Incident Reviews, refer to https://aka.ms/AzurePIRs
    • The impact times above represent the full incident duration, so are not specific to any individual customer. Actual impact to service availability may vary between customers and resources – for guidance on implementing monitoring to understand granular impact: https://aka.ms/AzPIR/Monitoring
    • Finally, for broader guidance on preparing for cloud incidents, refer to https://aka.ms/incidentreadiness

    Please click Accept Answer if the above was helpful.

    Thanks.

    -TP

    1 person found this answer helpful.

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.