Edit

Share via


Reliability in Azure App Service

Azure App Service is an HTTP-based service for hosting web applications, REST APIs, and mobile back ends. App Service integrates with Microsoft Azure to provide security, load balancing, autoscaling, and automated management for applications. This article describes reliability support in App Service. It covers intra-regional resiliency via availability zones and multiple-region deployments.

For more information about reliability support in App Service Environment, see Reliability in App Service Environment.

Reliability is a shared responsibility between you and Microsoft. You can use this guide to determine which reliability options fulfill your specific business objectives and uptime goals.

Production deployment recommendations

To learn about how to deploy App Service to support your solution's reliability requirements, and how reliability affects other aspects of your architecture, see Architecture best practices for App Service (Web Apps) in the Azure Well-Architected Framework.

Reliability architecture overview

When you create an App Service web app, you specify the App Service plan that runs the app.

An App Service plan defines a set of compute resources that run your web apps. All web apps must run inside a plan. You can scale a plan to run on multiple VM instances, also called workers. These instances provide the compute resources that run your app code. A single App Service plan can host multiple apps. All apps run on the same shared set of VM instances.

App Service provides the following redundancy features:

  • Distribution across fault domains: At the platform level, Azure automatically distributes your App Service plan's VM instances across fault domains within the Azure region. This distribution minimizes the risk of localized hardware failures by grouping VMs that share a common power source and network switch.

  • Distribution across availability zones: If you enable zone redundancy on a supported App Service plan, Azure distributes your instances across availability zones within the region. This configuration provides higher resiliency if a zone outage occurs. For more information about zone redundancy, see Availability zone support.

  • App scaling: When you configure your App Service plan to run multiple VM instances, all apps in the plan run on all instances by default. If you configure your plan for autoscaling, all apps scale out together based on the autoscale settings. However, you can customize how many plan instances run a specific app by using per-app scaling.

  • Scale units: Internally, App Service runs on a platform infrastructure called scale units, also known as stamps. A scale unit includes all components needed to host and run App Service, including compute, storage, networking, and load balancing. Azure manages scale units to ensure balanced workload distribution, perform routine maintenance, and maintain overall platform reliability.

    Some capabilities might only be applied to specific scale units. For example, some App Service scale units might support zone redundancy, while other scale units in the same region don't.

Transient faults

Transient faults are short, intermittent failures in components. They occur frequently in a distributed environment like the cloud, and they're a normal part of operations. Transient faults correct themselves after a short period of time. It's important that your applications can handle transient faults, usually by retrying affected requests.

All cloud-hosted applications should follow the Azure transient fault handling guidance when they communicate with any cloud-hosted APIs, databases, and other components. For more information, see Recommendations for handling transient faults.

Microsoft-provided SDKs usually handle transient faults. Because you host your own applications on App Service, take steps to reduce the chance of transient faults:

  • Deploy multiple instances in your plan. App Service performs automated updates and other forms of maintenance on instances in your plan. If an instance becomes unhealthy, the service can automatically replace that instance with a new healthy instance. During the replacement process, there can be a short period when the previous instance is unavailable and a new instance isn't ready to serve traffic. To mitigate these effects, deploy multiple instances of your App Service plan.

  • Use deployment slots. App Service deployment slots enable zero-downtime deployments of your applications. Use deployment slots to minimize the effect of deployments and configuration changes for your users. Deployment slots also reduce the likelihood that your application restarts. Restarting the application causes a transient fault.

  • Avoid scaling up or scaling down. These operations change the CPU, memory, and other resources assigned to each instance, and they can trigger an application restart. Instead, select a tier and instance size that meet your performance requirements under typical load. To scale out and scale in, dynamically add and remove instances to handle changes in traffic volume.

Availability zone support

Availability zones are physically separate groups of datacenters within each Azure region. When one zone fails, services can fail over to one of the remaining zones.

For Premium v2 to v4 tiers, you can configure App Service as zone redundant, which means that your resources are distributed across multiple availability zones. Distribution across multiple zones helps your production workloads achieve resiliency and reliability. When you configure zone redundancy on App Service plans, all apps that use the plan become zone redundant.

Region support

You can deploy zone-redundant App Service Premium v2 to v4 plans in any region that supports availability zones.

Requirements

To enable zone-redundancy, you must meet the following requirements:

  • Use Premium v2 to v4 plan types.

  • Deploy a minimum of two instances in your plan.

  • Use a scale unit that supports availability zones. When you create an App Service plan, the plan is assigned to a scale unit based on the resource group where the plan resides. If your scale unit doesn't support availability zones, you need to create a new plan in a new resource group.

    To determine whether the scale unit for your App Service plan supports zone redundancy, see Check for zone redundancy support for an App Service plan.

Instance distribution across zones

When you create a zone-redundant App Service plan, Azure distributes the plan's instances across availability zones in the region. This distribution ensures that your apps remain available even if one zone experiences an outage.

Instance distribution in a zone-redundant deployment follows specific rules. These rules also apply as the app scales in and out:

  • Minimum instances: Your App Service plan must have a minimum of two instances for zone redundancy.

  • Maximum availability zones supported by your plan: Azure determines the number of availability zones that your plan can use, which is referred to as maximumNumberOfZones. To view the number of availability zones that your specific plan can use, see Check zone redundancy support for an App Service plan.

  • Instance distribution: When zone redundancy is enabled, Azure distributes plan instances across multiple availability zones automatically. The distribution is based on the following rules:

    • If the number of instances exceeds maximumNumberOfZones and divides evenly, Azure distributes the instances evenly across zones.

    • If the number of instances doesn't divide evenly, Azure distributes the remaining instances across the remaining zones.

    • When the App Service platform allocates instances for a zone-redundant App Service plan, it uses best-effort zone balancing that the underlying Azure virtual machine scale sets provide. A plan is balanced if each zone has the same number of VMs or differs by one instance from all other zones. For more information, see Zone balancing.

  • Physical zone placement: You can view the physical availability zone used for each of your App Service plan instances. For more information, see View physical zones for an App Service plan.

Considerations

For Premium v2 to v4 plans, an availability zone outage might affect some aspects of Azure App Service, even though the application continues to serve traffic. These behaviors include App Service plan scaling, application creation, application configuration, and application publishing.

When you enable zone redundancy on your App Service Premium v2 to v4 plan, you also improve resiliency during platform updates. For more information, see Reliability during service maintenance.

For App Service plans that aren't configured as zone redundant, the underlying virtual machine (VM) instances aren't resilient to availability zone failures. They can experience downtime during an outage in any zone in that region.

Cost

When you use App Service Premium v2 to v4 plans, enabling availability zones doesn't add cost if you have two or more instances. Charges are based on your App Service plan SKU, the capacity that you specify, and any instances that you scale to based on your autoscale criteria.

If you enable availability zones but specify a capacity of less than two, the platform enforces a minimum instance count of two. The platform charges you for those two instances.

Configure availability zone support

Capacity planning and management

To prepare for availability zone failure, consider over-provisioning the capacity of your App Service plan. This approach allows the solution to tolerate some capacity loss and continue to function without degraded performance. For more information, see Manage capacity by using over-provisioning.

Normal operations

The following list describes what to expect when App Service plans are configured for zone redundancy and all availability zones are operational:

  • Traffic routing between zones: During normal operations, traffic is routed between all available App Service plan instances across all availability zones.

  • Data replication between zones: During normal operations, any state stored in your application's file system is stored in zone-redundant storage and synchronously replicated between availability zones.

Zone-down experience

An availability zone outage might affect some aspects of App Service, even though the application continues to serve traffic. These behaviors include App Service plan scaling, application creation, application configuration, and application publishing.

The following list describes what to expect when App Service plans are configured for zone redundancy and one or more availability zones are unavailable:

  • Detection and response: The App Service platform automatically detects failures in an availability zone and initiates a response. No manual intervention is required to initiate a zone failover.

  • Notification: You can monitor zone failure events through Azure Service Health and Azure Resource Health. Set up alerts on these services to receive notifications about zone-level problems.

  • Active requests: Any in-progress requests that connect to an App Service plan instance in the faulty availability zone are terminated. Retry those requests.

  • Traffic rerouting: App Service detects the lost instances from that zone and attempts to find new replacement instances. After App Service finds replacements, it distributes traffic across the new instances as needed.

    If autoscale is configured and determines that more instances are needed, it requests instances from App Service. Autoscale behavior operates independently of App Service platform behavior. So your instance count specification doesn't need to be a multiple of two. For more information, see Scale up an app in App Service and Autoscale overview.

    Important

    Azure doesn't guarantee that requests for more instances succeed in a zone-down scenario. The platform attempts to backfill lost instances on a best-effort basis. If you need guaranteed capacity during an availability zone failure, create and configure your App Service plans to account for zone loss by over-provisioning the capacity.

  • Nonruntime behaviors: Applications in a zone-redundant App Service plan continue to run and serve traffic even if an availability zone experiences an outage. However, nonruntime behaviors might be affected during an availability zone outage. These behaviors include App Service plan scaling, application creation, application configuration, and application publishing.

Failback

When the availability zone recovers, App Service automatically creates instances in the recovered availability zone, removes any temporary instances created in the other availability zones, and routes traffic between your instances as usual.

Testing for zone failures

The App Service platform manages traffic routing, failover, and failback for zone-redundant App Service plans. This feature is fully managed, so you don't need to initiate or validate availability zone failure processes.

Multiple-region support

App Service is a single-region service. If the region becomes unavailable, your application is also unavailable.

Alternative multiple-region approaches

To reduce the risk of a single-region failure affecting your application, you can deploy plans across multiple regions. The following steps help strengthen resilience:

  • Deploy your application to the plans in each region.
  • Configure load balancing and failover policies.
  • Replicate your data across regions so that you can recover your last application state.

Consider the following related resources:

Backups

When you use the Basic tier or higher, you can back up your App Service apps to a file by using the App Service backup and restore capabilities.

These capabilities help when it's difficult to redeploy code or when you store state on disk. Most solutions shouldn't rely exclusively on backups. Instead, use the other capabilities in this guide to support your resiliency requirements. However, backups protect against some risks that other approaches don't. For more information, see Back up and restore your app in App Service.

Reliability during service maintenance

App Service performs regular service upgrades and other maintenance tasks. To maintain your expected capacity during an upgrade, the platform automatically adds extra instances of the App Service plan during the upgrade process.

Enable zone redundancy. When you enable zone redundancy on your App Service plan, you also improve resiliency during platform updates. Update domains consist of collections of VMs that go offline during an update, and they map to availability zones. Deploying multiple instances in your App Service plan and enabling zone redundancy for your plan adds an extra layer of resiliency if an instance or zone becomes unhealthy during an upgrade.

For more information, see Routine planned maintenance for App Service and Routine maintenance for App Service, restarts, and downtime.

Service-level agreement

The service-level agreement (SLA) for Azure services describes the expected availability of each service and the conditions that your solution must meet to achieve that availability expectation. For more information, see SLAs for online services.

When you deploy a zone-redundant App Service plan, the uptime percentage defined in the SLA increases.