Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
When you deploy workloads onto AKS, you need to make a decision about the node pool configuration regarding the Virtual Machine (VM) size needed. As your workloads become more complex, and require different CPU, memory, and capabilities to run, the overhead of having to design your VM configuration for numerous resource requests becomes difficult.
Node auto provisioning (NAP) uses pending pod resource requirements to decide the optimal virtual machine configuration to run those workloads in the most efficient and cost-effective manner.
Node auto provisioning is based on the open source Karpenter project, and the AKS Karpenter provider, which is also open source. Node auto provisioning automatically deploys, configures, and manages Karpenter on your AKS clusters.
How node autoprovisioning works
Node auto provisioning provisions, scales, and manages virtual machines (nodes) in a cluster in response to pending pod pressure. Node auto provisioning uses these key components:
- NodePool and AKSNodeClass: Custom Resource Definitions that you create and manage to define node provisioning policies, VM specifications, and constraints for your workloads.
- NodeClaims: Managed by node autoprovisioning to represent the current state of provisioned nodes that you can monitor.
- Workload resource requirements: CPU, memory, and other specifications from your Pods, Deployments, Jobs, and other Kubernetes resources that drive provisioning decisions.
Prerequisites
Prerequisite | Notes |
---|---|
Azure Subscription | If you don't have an Azure subscription, you can create a free account. |
Azure CLI | 2.76.0 or later. To find the version, run az --version . For more information about installing or upgrading the Azure CLI, see Install Azure CLI. |
Limitations
- You can't enable node autoprovisioning in a cluster where node pools have cluster autoscaler enabled
Unsupported features
- Windows node pools
- IPv6 clusters
- Service Principals
Note
You can use either a system-assigned or user-assigned managed identity.
- Disk encryption sets
- CustomCATrustCertificates
- Clusters with node autoprovisioning can't be stopped
- HTTP proxy
- All cluster egress outbound types are supported, however the type can't be changed after the cluster is created
Networking configuration
The following network configurations are recommended for clusters enabled with node autoprovisioning:
- Azure Container Network Interface (CNI) Overlay with Cilium
- Azure CNI Overlay
- Azure CNI with Cilium
- Azure CNI
For detailed networking configuration requirements and recommendations, see Node autoprovisioning networking configuration.
Key networking considerations:
- Azure CNI Overlay with Cilium is recommended
- Standard Load Balancer is required
- Private clusters aren't currently supported
Enable node autoprovisioning
Enable node autoprovisioning on a new cluster
Node auto provisioning is enabled by setting the field --node-provisioning-mode
to Auto, which sets the Node Provisioning Profile to Auto. The default setting for this field is Manual
.
Enable node autoprovisioning on a new cluster using the
az aks create
command and set--node-provisioning-mode
toAuto
. You can also set the--network-plugin
toazure
,--network-plugin-mode
tooverlay
(optional), and--network-dataplane
tocilium
(optional).az aks create \ --name $CLUSTER_NAME \ --resource-group $RESOURCE_GROUP_NAME \ --node-provisioning-mode Auto \ --network-plugin azure \ --network-plugin-mode overlay \ --network-dataplane cilium \ --generate-ssh-keys
Enable node autoprovisioning on an existing cluster
Enable node autoprovisioning on an existing cluster using the
az aks update
command and set--node-provisioning-mode
toAuto
.az aks update --name $CLUSTER_NAME --resource-group $RESOURCE_GROUP_NAME --node-provisioning-mode Auto
Basic NodePool and AKSNodeClass example
After enabling node autoprovisioning on your cluster, you can create a basic NodePool and AKSNodeClass to start provisioning nodes. Here's a simple example:
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: default
spec:
template:
metadata:
labels:
intent: apps
spec:
requirements:
- key: karpenter.sh/capacity-type
operator: In
values: [spot, on-demand]
- key: karpenter.azure.com/sku-family
operator: In
values: [D]
expireAfter: Never
limits:
cpu: 100
disruption:
consolidationPolicy: WhenEmptyOrUnderutilized
consolidateAfter: 0s
---
apiVersion: karpenter.azure.com/v1beta1
kind: AKSNodeClass
metadata:
name: default
annotations:
kubernetes.io/description: "General purpose AKSNodeClass for running Ubuntu2204 nodes"
spec:
imageFamily: Ubuntu2204
This example creates a basic NodePool that:
- Supports both spot and on-demand instances
- Uses D-series VMs
- Sets a CPU limit of 100
- Enables consolidation when nodes are empty or underutilized
Custom Virtual Networks and node autoprovisioning
AKS allows you to add a cluster with node autoprovisioning enabled in a custom virtual network via the --vnet-subnet-id
parameter. The following sections detail how to:
- Create a virtual network
- Create a managed identity with permissions over the virtual network
- Create a node autoprovisioning-enabled cluster in a custom virtual network
Create a virtual network
Create a virtual network using the az network vnet create
command. Create a cluster subnet using the az network vnet subnet create
command.
When using a custom virtual network with node autoprovisioning, you must create and delegate an API server subnet to Microsoft.ContainerService/managedClusters
, which grants the AKS service permissions to inject the API server pods and internal load balancer into that subnet. You can't use the subnet for any other workloads, but you can use it for multiple AKS clusters located in the same virtual network. The minimum supported API server subnet size is a /28.
az network vnet create --name ${VNET_NAME} \
--resource-group ${RG_NAME} \
--location ${LOCATION} \
--address-prefixes 172.19.0.0/16
az network vnet subnet create --resource-group ${RG_NAME} \
--vnet-name ${VNET_NAME} \
--name clusterSubnet \
--delegations Microsoft.ContainerService/managedClusters \
--address-prefixes 172.19.0.0/28
All traffic within the virtual network is allowed by default. But if you added Network Security Group (NSG) rules to restrict traffic between different subnets, see our Network Security Group documentation for the proper permissions.
Create a managed identity and give it permissions on the virtual network
Create a managed identity using the az identity create
command and retrieve the principal ID. Assign the Network Contributor role on virtual network to the managed identity using the az role assignment create
command.
az identity create --resource-group ${RG_NAME} \
--name ${IDENTITY_NAME} \
--location ${LOCATION}
IDENTITY_PRINCIPAL_ID=$(az identity show --resource-group ${RG_NAME} --name ${IDENTITY_NAME} \
--query principalId -o tsv)
az role assignment create --scope "/subscriptions/${SUBSCRIPTION_ID}/resourceGroups/${RG_NAME}/providers/Microsoft.Network/virtualNetworks/${VNET_NAME}" \
--role "Network Contributor" \
--assignee ${IDENTITY_PRINCIPAL_ID}
Create an AKS cluster in a custom virtual network and with node autoprovisioning enabled
In the following command, an Azure Kubernetes Service (AKS) cluster is created as part of a custom virtual network using the az aks create command. To create a customer virtual network
az aks create --name $(AZURE_CLUSTER_NAME) --resource-group $(AZURE_RESOURCE_GROUP) \
--enable-managed-identity --generate-ssh-keys -o none --network-dataplane cilium --network-plugin azure --network-plugin-mode overlay \
--vnet-subnet-id "/subscriptions/$(AZURE_SUBSCRIPTION_ID)/resourceGroups/$(AZURE_RESOURCE_GROUP)/providers/Microsoft.Network/virtualNetworks/$(CUSTOM_VNET_NAME)/subnets/$(CUSTOM_SUBNET_NAME)" \
--node-provisioning-mode Auto
az aks create --resource-group ${RG_NAME} \
--name ${CLUSTER_NAME} \
--location ${LOCATION} \
--vnet-subnet-id "/subscriptions/${SUBSCRIPTION_ID}/resourceGroups/${RG_NAME}/providers/Microsoft.Network/virtualNetworks/${VNET_NAME}/subnets/clusterSubnet" \
--assign-identity "/subscriptions/${SUBSCRIPTION_ID}/resourcegroups/${RG_NAME}/providers/Microsoft.ManagedIdentity/userAssignedIdentities/${IDENTITY_NAME}" \
--node-provisioning-mode Auto
After a few minutes, the command completes and returns JSON-formatted information about the cluster.
Configure kubectl
to connect to your Kubernetes cluster using the az aks get-credentials command. This command downloads credentials and configures the Kubernetes CLI to use them.
az aks get-credentials --resource-group ${RG_NAME} --name ${CLUSTER_NAME}
Verify the connection to your cluster using the kubectl get command. This command returns a list of the cluster nodes.
kubectl get nodes
Node pools
For detailed node pool configuration including SKU selectors, limits, and weights, see Node autoprovisioning node pools configuration.
Node autoprovisioning uses VM SKU requirements to decide the best virtual machine for pending workloads. You can configure:
- SKU families and specific instance types
- Resource limits and priorities
- Spot vs on-demand instances
- Architecture and capabilities requirements
Node disruption
Disruption Controls
Node Disruption, including Consolidation or Drift, can be controlled using different methods.
Consolidation
When workloads on your nodes scale down, node autoprovisioning uses disruption rules. These rules decide when and how to remove nodes and reschedule workloads for better efficiency. Node auto provisioning primarily uses consolidation to delete or replace nodes for optimal pod placement. The state-based consideration uses ConsolidationPolicy
such as WhenEmpty
, or WhenEmptyOrUnderUtilized
to trigger consolidation. consolidateAfter
is a time-based condition that can be set to allow buffer time between actions.
You can remove a node manually using kubectl delete node
, but node autoprovisioning can also control when it should optimize your nodes based on your specifications.
disruption:
# Describes which types of Nodes node autoprovisioning should consider for consolidation
consolidationPolicy: WhenEmptyorUnderutilized
# 'WhenEmptyorUnderutilized', node autoprovisioning will consider all nodes for consolidation and attempt to remove or replace Nodes when it discovers that the Node is empty or underutilized and could be changed to reduce cost
# `WhenEmpty`, node autoprovisioning will only consider nodes for consolidation that contain no workload pods
# The amount of time node autoprovisioning should wait after discovering a consolidation decision
# This value can currently only be set when the consolidationPolicy is 'WhenEmpty'
# You can choose to disable consolidation entirely by setting the string value 'Never'
consolidateAfter: 30s
Disruption Controls
Node autoprovisioning optimizes your cluster by:
- Removing or replacing underutilized nodes
- Consolidating workloads to reduce costs
- Respecting disruption budgets and maintenance windows
- Providing manual control when needed
For detailed information about node disruption policies, upgrade mechanisms through drift, consolidation, and disruption budgets, see Node autoprovisioning disruption policies.
Kubernetes upgrades
Kubernetes upgrades for node autoprovisioning nodes follow the control plane Kubernetes version. If you perform a cluster upgrade, your nodes are automatically updated to follow the same versioning.
AKS recommends coupling node autoprovisioning with a Kubernetes Auto Upgrade channel for the cluster, which automatically handles all your cluster's Kubernetes upgrades. Pairing the Auto Upgrade channel with an aksManagedAutoUpgradeSchedule
planned maintenance window, you can schedule your cluster upgrades during optimal times for your workloads. For more information on planning cluster upgrades, visit our documentation on planned maintenance
Node image updates
By default node autoprovisioning node pool virtual machines are automatically updated when a new image is available. There are multiple methods to regulate when your node image updates take place, including Karpenter or Node Disruption Budgets, and Pod Disruption Budgets.
Note
Node auto provisioning Metrics
You can enable control plane metrics (Preview) to see the logs and operations from node auto provisioning with the Azure Monitor managed service for Prometheus add-on
Monitoring selection events
Node auto provisioning produces cluster events that can be used to monitor deployment and scheduling decisions being made. You can view events through the Kubernetes events stream.
kubectl get events -A --field-selector source=karpenter -w
Disabling node autoprovisioning
Node auto provisioning can only be disabled when:
- There are no existing node autoprovisioning-managed nodes. Use
kubectl get nodes -l karpenter.sh/nodepool
to view node autoprovisioning-managed nodes. - All existing karpenter.sh/NodePools have their
spec.limits.cpu
field set to 0.
Steps to disable node autoprovisioning
- Set all karpenter.sh/NodePools
spec.limits.cpu
field to 0. This action prevents new nodes from being created, but doesn't disrupt currently running nodes.
Note
If you don't care about ensuring that every pod that was running on a node autoprovisioning node is migrated safely to a non-node autoprovisioning node,
you can skip steps 2 and 3 and instead use the kubectl delete node
command for each node autoprovisioning-managed node.
Skipping steps 2 and 3 is not recommended, as it might leave some pods pending and doesn't honor Pod Disruption Budgets (PDBs).
Don't run kubectl delete node
on any nodes that aren't managed by node autoprovisioning.
Add the
karpenter.azure.com/disable:NoSchedule
taint to every karpenter.sh/NodePool.apiVersion: karpenter.sh/v1 kind: NodePool metadata: name: default spec: template: spec: ... taints: - key: karpenter.azure.com/disable, effect: NoSchedule
This action starts the process of migrating the workloads on the node autoprovisioning-managed nodes to non-NAP nodes, honoring Pod Disruption Budgets (PDBs) and disruption limits. Pods migrate to non-NAP nodes if they can fit. If there isn't enough fixed-size capacity, some node autoprovisioning-managed nodes remain.
Scale up existing fixed-size ManagedCluster AgentPools, or create new fixed-size AgentPools, to take the load from the node autoprovisioning-managed nodes. As these nodes are added to the cluster the node autoprovisioning-managed nodes are drained, and work is migrated to the fixed-scale nodes.
Confirm that all node autoprovisioning-managed nodes are deleted, using
kubectl get nodes -l karpenter.sh/nodepool
. If node autoprovisioning-managed nodes still exist, the cluster likely lacks fixed-scale capacity. Add more nodes so the remaining workloads can be migrated.Update the node provisioning mode parameter of the ManagedCluster to
Manual
.az aks update \ --name $CLUSTER_NAME \ --resource-group $RESOURCE_GROUP_NAME \ --node-provisioning-mode Manual
Azure Kubernetes Service