Edit

Share via


Reliability in App Service Environment

App Service Environment is an Azure App Service feature that provides a fully isolated and dedicated environment to run App Service apps securely at high scale. Unlike the App Service public multitenant offering that shares supporting infrastructure, an App Service Environment provides dedicated compute for a single customer.

An environment provides the following key reliability benefits:

  • Dedicated compute resources that aren't shared with other customers
  • Enhanced network isolation for improved security and stability
  • The ability to deploy in your own virtual network for greater control over traffic routing and security policies

This article describes reliability support in App Service Environment. It covers intra-regional resiliency via availability zones and multiple-region deployments.

For more information about reliability support in App Service, see Reliability in App Service.

Reliability is a shared responsibility between you and Microsoft. You can use this guide to determine which reliability options fulfill your specific business objectives and uptime goals.

Production deployment recommendations

Enable zone redundancy on your environment, which requires that your App Service plans use a minimum of two instances.

Reliability architecture overview

When you implement an App Service Environment, you deploy the environment as the container for your App Service plans and web apps. During setup, configure core networking settings and optional hardware isolation. Choose whether to support zone redundancy on the environment if the region supports availability zones.

After you create your environment, you can create one or more App Service plans.

An App Service plan defines a set of compute resources that run your web apps. All web apps must run inside a plan. You can scale a plan to run on multiple VM instances, also called workers. These instances provide the compute resources that run your app code. A single App Service plan can host multiple apps. All apps run on the same shared set of VM instances.

To use an App Service Environment, your plans must use the Isolated v2 pricing tier. This tier supports zone redundancy and high-scale, mission-critical applications.

App Service provides the following redundancy features:

  • Distribution across fault domains: At the platform level, Azure automatically distributes your App Service plan's VM instances across fault domains within the Azure region. This distribution minimizes the risk of localized hardware failures by grouping VMs that share a common power source and network switch.

  • Distribution across availability zones: If you enable zone redundancy on a supported App Service plan, Azure distributes your instances across availability zones within the region. This configuration provides higher resiliency if a zone outage occurs. For more information about zone redundancy, see Availability zone support.

  • App scaling: When you configure your App Service plan to run multiple VM instances, all apps in the plan run on all instances by default. If you configure your plan for autoscaling, all apps scale out together based on the autoscale settings. However, you can customize how many plan instances run a specific app by using per-app scaling.

  • Scale units: Internally, App Service runs on a platform infrastructure called scale units, also known as stamps. A scale unit includes all components needed to host and run App Service, including compute, storage, networking, and load balancing. Azure manages scale units to ensure balanced workload distribution, perform routine maintenance, and maintain overall platform reliability.

    Some capabilities might only be applied to specific scale units. For example, some App Service scale units might support zone redundancy, while other scale units in the same region don't.

Transient faults

Transient faults are short, intermittent failures in components. They occur frequently in a distributed environment like the cloud, and they're a normal part of operations. Transient faults correct themselves after a short period of time. It's important that your applications can handle transient faults, usually by retrying affected requests.

All cloud-hosted applications should follow the Azure transient fault handling guidance when they communicate with any cloud-hosted APIs, databases, and other components. For more information, see Recommendations for handling transient faults.

Microsoft-provided SDKs usually handle transient faults. Because you host your own applications on App Service, take steps to reduce the chance of transient faults:

  • Deploy multiple instances in your plan. App Service performs automated updates and other forms of maintenance on instances in your plan. If an instance becomes unhealthy, the service can automatically replace that instance with a new healthy instance. During the replacement process, there can be a short period when the previous instance is unavailable and a new instance isn't ready to serve traffic. To mitigate these effects, deploy multiple instances of your App Service plan.

  • Use deployment slots. App Service deployment slots enable zero-downtime deployments of your applications. Use deployment slots to minimize the effect of deployments and configuration changes for your users. Deployment slots also reduce the likelihood that your application restarts. Restarting the application causes a transient fault.

  • Avoid scaling up or scaling down. These operations change the CPU, memory, and other resources assigned to each instance, and they can trigger an application restart. Instead, select a tier and instance size that meet your performance requirements under typical load. To scale out and scale in, dynamically add and remove instances to handle changes in traffic volume.

Availability zone support

Availability zones are physically separate groups of datacenters within each Azure region. When one zone fails, services can fail over to one of the remaining zones.

You can configure your App Service Environment as zone redundant. You can also configure your App Service plans to be zone redundant, which distributes them across multiple availability zones.

However, you can enable or disable zone redundancy on each plan. This means that you can have some plans in your environment that are zone redundant and others that aren't.

When you create a zone-redundant App Service plan in your environment, the instances of your App Service plan are distributed across the availability zones in the region. For more information, see Instance distribution across zones.

Region support

To see which regions support availability zones for App Service Environment v3, see Regions.

Requirements

To enable zone redundancy for your App Service Environment, you must meet the following requirements:

  • Use Isolated v2 plan types.

  • Deploy a minimum of two instances in your plan.

  • Use a scale unit that supports availability zones. When you create an App Service Environment, the environment is assigned to a scale unit based on the resource group where the environment resides. If your scale unit doesn't support availability zones, you need to create a new environment in a new resource group.

  • Configure your App Service Environment and your plans to support zone redundancy. You can enable zone redundancy during the creation of the environment or by updating an existing environment.

    To learn whether or not the App Service Environment is configured for zone redundancy, see Check for zone redundancy support for an App Service Environment.

Instance distribution across zones

When you create a zone-redundant App Service plan, Azure distributes the plan's instances across availability zones in the region. This distribution ensures that your apps remain available even if one zone experiences an outage.

Instance distribution in a zone-redundant deployment follows specific rules. These rules also apply as the app scales in and out:

  • Minimum instances: Your App Service plan must have a minimum of two instances for zone redundancy.

  • Maximum availability zones supported by your plan: Azure determines the number of availability zones that your plan can use, which is referred to as maximumNumberOfZones. To view the number of availability zones that your specific plan can use, see Check zone redundancy support for an App Service plan.

  • Instance distribution: When zone redundancy is enabled, Azure distributes plan instances across multiple availability zones automatically. The distribution is based on the following rules:

    • If the number of instances exceeds maximumNumberOfZones and divides evenly, Azure distributes the instances evenly across zones.

    • If the number of instances doesn't divide evenly, Azure distributes the remaining instances across the remaining zones.

    • When the App Service platform allocates instances for a zone-redundant App Service plan, it uses best-effort zone balancing that the underlying Azure virtual machine scale sets provide. A plan is balanced if each zone has the same number of VMs or differs by one instance from all other zones. For more information, see Zone balancing.

  • Physical zone placement: You can view the physical availability zone used for each of your App Service plan instances. For more information, see View physical zones for an App Service plan.

Considerations

An availability zone outage might affect some aspects of App Service, even though the application continues to serve traffic. These behaviors include App Service plan scaling, application creation, application configuration, and application publishing.

When you enable zone redundancy on your App Service plan, you also improve resiliency during platform updates. For more information, see Reliability during service maintenance.

For App Service plans that aren't zone redundant, the underlying VM instances aren't resilient to availability zone failures. They can experience downtime during an outage in any zone in that region.

Cost

You can enable zone redundancy on an App Service Environment or its plans at no extra cost. However, zone redundancy for a plan requires that it has two or more instances. You're charged based on your App Service plan SKU, the capacity that you specify, and any instances that you scale to based on your autoscale criteria.

If you enable availability zones but specify a capacity of fewer than two instances, the platform enforces a minimum instance of two. The platform charges you for those two instances.

Configure availability zone support

To learn how to create, enable, or disable a new zone-redundant App Service Environment and new zone-redundant App Service plans, see Configure App Service Environments and Isolated v2 App Service plans for zone redundancy.

Note

A change in the zone redundancy status of an App Service Environment takes 12 to 24 hours to complete. During the upgrade process, no downtime or performance problems occur.

Capacity planning and management

To prepare for availability zone failure, consider over-provisioning the capacity of your App Service plan. This approach allows the solution to tolerate some capacity loss and continue to function without degraded performance. For more information, see Manage capacity by using over-provisioning.

Normal operations

The following list describes what to expect when App Service plans are configured for zone redundancy and all availability zones are operational:

  • Traffic routing between zones: During normal operations, traffic is routed between all available App Service plan instances across all availability zones.

  • Data replication between zones: During normal operations, any state stored in your application's file system is stored in zone-redundant storage and synchronously replicated between availability zones.

Zone-down experience

An availability zone outage might affect some aspects of App Service, even though the application continues to serve traffic. These behaviors include App Service plan scaling, application creation, application configuration, and application publishing.

The following list describes what to expect when App Service plans are configured for zone redundancy and one or more availability zones are unavailable:

  • Detection and response: The App Service platform automatically detects failures in an availability zone and initiates a response. No manual intervention is required to initiate a zone failover.

  • Notification: You can monitor zone failure events through Azure Service Health and Azure Resource Health. Set up alerts on these services to receive notifications about zone-level problems.

  • Active requests: Any in-progress requests that connect to an App Service plan instance in the faulty availability zone are terminated. Retry those requests.

  • Traffic rerouting: App Service detects the lost instances from that zone and attempts to find new replacement instances. After App Service finds replacements, it distributes traffic across the new instances as needed.

    If autoscale is configured and determines that more instances are needed, it requests instances from App Service. Autoscale behavior operates independently of App Service platform behavior. So your instance count specification doesn't need to be a multiple of two. For more information, see Scale up an app in App Service and Autoscale overview.

    Important

    Azure doesn't guarantee that requests for more instances succeed in a zone-down scenario. The platform attempts to backfill lost instances on a best-effort basis. If you need guaranteed capacity during an availability zone failure, create and configure your App Service plans to account for zone loss by over-provisioning the capacity.

  • Nonruntime behaviors: Applications in a zone-redundant App Service plan continue to run and serve traffic even if an availability zone experiences an outage. However, nonruntime behaviors might be affected during an availability zone outage. These behaviors include App Service plan scaling, application creation, application configuration, and application publishing.

Failback

When the availability zone recovers, App Service automatically creates instances in the recovered availability zone, removes any temporary instances created in the other availability zones, and routes traffic between your instances as usual.

Testing for zone failures

The App Service platform manages traffic routing, failover, and failback for zone-redundant App Service plans. This feature is fully managed, so you don't need to initiate or validate availability zone failure processes.

Multiple-region support

App Service is a single-region service. If the region becomes unavailable, your environment and its plans and apps also become unavailable.

Alternative multiple-region approaches

To reduce the risk of a single-region failure affecting your application, deploy multiple App Service Environments across multiple regions. The following steps help strengthen resilience:

  • Deploy your application to the App Service Environments in each region.
  • Configure load balancing and failover policies.
  • Replicate your data across regions so that you can recover your last application state.

For an example approach that illustrates this architecture, see High availability enterprise deployment by using App Service Environment.

Backups

To back up your App Service apps to a file, use App Service backup and restore capabilities.

These capabilities help when it's difficult to redeploy code or when you store state on disk. Most solutions shouldn't rely exclusively on backups. Instead, use the other capabilities in this guide to support your resiliency requirements. However, backups protect against some risks that other approaches don't. For more information, see Back up and restore your app in App Service.

Reliability during service maintenance

App Service performs regular service upgrades and other maintenance tasks. To maintain your expected capacity during an upgrade, the platform automatically adds extra instances of the App Service plan during the upgrade process.

Enable zone redundancy. When you enable zone redundancy on your App Service plan, you also improve resiliency during platform updates. Update domains consist of collections of VMs that go offline during an update, and they map to availability zones. Deploying multiple instances in your App Service plan and enabling zone redundancy for your plan adds an extra layer of resiliency if an instance or zone becomes unhealthy during an upgrade.

Customize the upgrade cycle. You can customize the upgrade cycle for an App Service Environment. If you need to validate the effect of upgrades on your workload, enable manual upgrades. This approach allows you to perform validation and testing on a nonproduction instance before applying them to your production instance.

For more information about maintenance preferences, see Upgrade preferences for App Service Environment planned maintenance.

Service-level agreement

The service-level agreement (SLA) for Azure services describes the expected availability of each service and the conditions that your solution must meet to achieve that availability expectation. For more information, see SLAs for online services.

When you deploy a zone-redundant App Service plan, the uptime percentage defined in the SLA increases.