Create and manage instance types for efficient utilization of compute resources

2025-07-23

Instance types are an Azure Machine Learning concept that allows targeting certain types of compute nodes for training and inference workloads. For example, in an Azure virtual machine, an instance type is STANDARD_D2_V3. This article shows you how to create and manage instance types for your computation requirements.

In Kubernetes clusters, instance types are represented as a custom resource definition (CRD) installed with the Azure Machine Learning extension. Two elements in the Azure Machine Learning extension represent instance types:

nodeSelector: Use nodeSelector to specify which node a pod should run on. The node must have a corresponding label.
resources: In the resources section, you can set the compute resources (CPU, memory, and NVIDIA GPU) for the pod.

If you specify a nodeSelector field when deploying the Azure Machine Learning extension, the nodeSelector field applies to all instance types. This means:

For each instance type that you create, the specified nodeSelector field should be a subset of the extension-specified nodeSelector field.
If you use an instance type with nodeSelector, the workload runs on any node that matches both the extension-specified nodeSelector field and the instance-type-specified nodeSelector field.
If you use an instance type without a nodeSelector field, the workload runs on any node that matches the extension-specified nodeSelector field.

Create a default instance type

By default, an instance type called defaultinstancetype is created when you attach a Kubernetes cluster to an Azure Machine Learning workspace. Here's the definition:

resources:
  requests:
    cpu: "100m"
    memory: "2Gi"
  limits:
    cpu: "2"
    memory: "2Gi"
    nvidia.com/gpu: null

If you don't apply a nodeSelector field, the pod can be scheduled on any node. The workload's pods are assigned default resources with 0.1 CPU cores, 2 GB of memory, and 0 GPUs for the request. The resources that the workload's pods use are limited to 2 CPU cores and 8 GB of memory.

The default instance type purposefully uses minimal resources. To ensure that all machine learning workloads run with appropriate resources (for example, GPU resources), we highly recommend that you create custom instance types.

Keep in mind the following points about the default instance type:

defaultinstancetype doesn't appear as an InstanceType custom resource in the cluster when you run the command kubectl get instancetype, but it does appear in all clients (UI, Azure CLI, SDK).
defaultinstancetype can be overridden with the definition of a custom instance type that has the same name.

Create a custom instance type

To create a new instance type, create a new custom resource for the instance type CRD. For example:

kubectl apply -f my_instance_type.yaml

Here are the contents of my_instance_type.yaml:

apiVersion: amlarc.azureml.com/v1alpha1
kind: InstanceType
metadata:
  name: myinstancetypename
spec:
  nodeSelector:
    mylabel: mylabelvalue
  resources:
    limits:
      cpu: "1"
      nvidia.com/gpu: 1
      memory: "2Gi"
    requests:
      cpu: "700m"
      memory: "1500Mi"

The preceding code creates an instance type with the following behavior:

Pods are scheduled only on nodes that have the label mylabel: mylabelvalue.
Pods are assigned resource requests of 700m for CPU and 1500Mi for memory.
Pods are assigned resource limits of 1 for CPU, 2Gi for memory, and 1 for NVIDIA GPU.

Custom instance type creation must meet the following parameters and definition rules, or it fails:

Parameter	Required or optional	Description
`name`	Required	String values that must be unique in a cluster.
`CPU request`	Required	String values that can't be zero or empty. You can specify the CPU in millicores; for example, `100m`. You can also specify it as full numbers. For example, `"1"` is equivalent to `1000m`.
`Memory request`	Required	String values that can't be zero or empty. You can specify the memory as a full number + suffix; for example, `1024Mi` for 1,024 mebibytes (MiB).
`CPU limit`	Required	String values that can't be zero or empty. You can specify the CPU in millicores; for example, `100m`. You can also specify it as full numbers. For example, `"1"` is equivalent to `1000m`.
`Memory limit`	Required	String values that can't be zero or empty. You can specify the memory as a full number + suffix; for example, `1024Mi` for 1024 MiB.
`GPU`	Optional	Integer values that can be specified only in the `limits` section. For more information, see the Kubernetes documentation.
`nodeSelector`	Optional	Map of string keys and values.

You can also create multiple instance types at once:

kubectl apply -f my_instance_type_list.yaml

Here are the contents of my_instance_type_list.yaml:

apiVersion: amlarc.azureml.com/v1alpha1
kind: InstanceTypeList
items:
  - metadata:
      name: cpusmall
    spec:
      resources:
        requests:
          cpu: "100m"
          memory: "100Mi"
        limits:
          cpu: "1"
          nvidia.com/gpu: 0
          memory: "1Gi"

  - metadata:
      name: defaultinstancetype
    spec:
      resources:
        requests:
          cpu: "1"
          memory: "1Gi" 
        limits:
          cpu: "1"
          nvidia.com/gpu: 0
          memory: "1Gi"

The preceding example creates two instance types: cpusmall and defaultinstancetype. This defaultinstancetype definition overrides the defaultinstancetype definition that was created when you attached the Kubernetes cluster to the Azure Machine Learning workspace.

If you submit a training or inference workload without an instance type, it uses defaultinstancetype. To specify a default instance type for a Kubernetes cluster, create an instance type with the name defaultinstancetype. It's automatically recognized as the default.

Select an instance type to submit a training job

Azure CLI
Python SDK

To select an instance type for a training job using the Azure CLI (v2), specify its name as part of the resources properties section in the job YAML. For example:

command: python -c "print('Hello world!')"
environment:
  image: library/python:latest
compute: azureml:<Kubernetes-compute_target_name>
resources:
  instance_type: <instance type name>

To select an instance type for a training job using the SDK (v2), specify its name for the instance_type property in the command class. For example:

from azure.ai.ml import command

# define the command
command_job = command(
    command="python -c  print('Hello world!')"",
    environment="AzureML-lightgbm-3.2-ubuntu18.04-py37-cpu@latest",
    compute="<Kubernetes-compute_target_name>",
    instance_type="<instance type name>"
)

In the preceding example, replace <Kubernetes-compute_target_name> with the name of your Kubernetes compute target. Replace <instance type name> with the name of the instance type that you want to select. If you don't specify an instance_type property, the system uses defaultinstancetype to submit the job.

Select an instance type to deploy a model

Azure CLI
Python SDK

To select an instance type for a model deployment using the Azure CLI (v2), specify its name for the instance_type property in the deployment YAML. For example:

name: blue
app_insights_enabled: true
endpoint_name: <endpoint name>
model: 
  path: ./model/sklearn_mnist_model.pkl
code_configuration:
  code: ./script/
  scoring_script: score.py
instance_type: <instance type name>
environment: 
  conda_file: file:./model/conda.yml
  image: mcr.microsoft.com/azureml/openmpi3.1.2-ubuntu18.04:latest

To select an instance type for a model deployment using the SDK (v2), specify its name for the instance_type property in the KubernetesOnlineDeployment class. For example:

from azure.ai.ml import KubernetesOnlineDeployment,Model,Environment,CodeConfiguration

model = Model(path="./model/sklearn_mnist_model.pkl")
env = Environment(
    conda_file="./model/conda.yml",
    image="mcr.microsoft.com/azureml/openmpi3.1.2-ubuntu18.04:latest",
)

# define the deployment
blue_deployment = KubernetesOnlineDeployment(
    name="blue",
    endpoint_name="<endpoint name>",
    model=model,
    environment=env,
    code_configuration=CodeConfiguration(
        code="./script/", scoring_script="score.py"
    ),
    instance_count=1,
    instance_type="<instance type name>",
)

In the preceding example, replace <instance type name> with the name of the instance type that you want to select. If you don't specify an instance_type property, the system uses defaultinstancetype to deploy the model.

Important

For MLflow model deployment, the resource request requires at least 2 CPU cores and 4 GB of memory. Otherwise, the deployment fails.

Resource section validation

Use the resources section to define the resource request and limit for your model deployments. For example:

Azure CLI
Python SDK

name: blue
app_insights_enabled: true
endpoint_name: <endpoint name>
model: 
  path: ./model/sklearn_mnist_model.pkl
code_configuration:
  code: ./script/
  scoring_script: score.py
environment: 
  conda_file: file:./model/conda.yml
  image: mcr.microsoft.com/azureml/openmpi3.1.2-ubuntu18.04:latest
resources:
  requests:
    cpu: "0.1"
    memory: "0.2Gi"
  limits:
    cpu: "0.2"
    #nvidia.com/gpu: 0
    memory: "0.5Gi"
instance_type: <instance type name>

from azure.ai.ml import (
    KubernetesOnlineDeployment,
    Model,
    Environment,
    CodeConfiguration,
    ResourceSettings,
    ResourceRequirementsSettings
)

model = Model(path="./model/sklearn_mnist_model.pkl")
env = Environment(
    conda_file="./model/conda.yml",
    image="mcr.microsoft.com/azureml/openmpi3.1.2-ubuntu18.04:latest",
)

requests = ResourceSettings(cpu="0.1", memory="0.2G")
limits = ResourceSettings(cpu="0.2", memory="0.5G", nvidia_gpu="1")
resources = ResourceRequirementsSettings(requests=requests, limits=limits)

# define the deployment
blue_deployment = KubernetesOnlineDeployment(
    name="blue",
    endpoint_name="<endpoint name>",
    model=model,
    environment=env,
    code_configuration=CodeConfiguration(
        code="./script/", scoring_script="score.py"
    ),
    resources=resources,
    instance_count=1,
    instance_type="<instance type name>",
)

When you use the resources section, a valid resource definition must meet the following rules. An invalid resource definition causes the model deployment to fail.

Parameter	Required or optional	Description
`requests:` `cpu:`	Required	String values that can't be zero or empty. You can specify the CPU in millicores; for example, `100m`. You can also specify it in full numbers. For example, `"1"` is equivalent to `1000m`.
`requests:` `memory:`	Required	String values that can't be zero or empty. You can specify the memory as a full number + suffix; for example, `1024Mi` for 1024 MiB. Memory can't be less than 1 MB.
`limits:` `cpu:`	Optional (required only when you need GPU)	String values that can't be zero or empty. You can specify the CPU in millicores; for example, `100m`. You can also specify it in full numbers. For example, `"1"` is equivalent to `1000m`.
`limits:` `memory:`	Optional (required only when you need GPU)	String values that can't be zero or empty. You can specify the memory as a full number + suffix; for example, `1024Mi` for 1,024 MiB.
`limits:` `nvidia.com/gpu:`	Optional (required only when you need GPU)	Integer values that can't be empty and can be specified only in the `limits` section. For more information, see the Kubernetes documentation. If you require CPU only, you can omit the entire `limits` section.

An instance type is required for model deployment. If you define the resources section, it's validated against the instance type according to the following rules:

With a valid resource section definition, the resource limits must be less than the instance type limits. Otherwise, deployment fails.
If you don't define an instance type, the system uses defaultinstancetype for validation with the resources section.
If you don't define the resources section, the system uses the instance type to create the deployment.

Share via

Create and manage instance types for efficient utilization of compute resources

Create a default instance type

Create a custom instance type

Select an instance type to submit a training job

Select an instance type to deploy a model

Resource section validation

Next steps

Feedback

Additional resources