Edit

Share via


Run ONNX models with Windows ML

Important

The Windows ML APIs are currently experimental and not supported for use in production environments. Apps trying out these APIs should not be published to the Microsoft Store.

Windows Machine Learning (ML) enables your apps to use the ONNX Runtime without distributing your own copy of the runtime and the EPs. Your app will depend on Windows ML to dynamically download, update, and initialize EPs that are shared system-wide, and will use the shared copy of the ONNX Runtime that ships with Windows ML!

Prerequisites

  • Windows 11 PC running version 24H2 (build 26100) or greater.

In addition to the above, there are language-specific prerequisites depending on what language your app is written in.

  • .NET 6 or greater
  • Targeting a Windows 10-specific TFM like net6.0-windows10.0.19041.0 or greater

Step 1: Install the Windows App SDK and Windows ML NuGet packages

Follow the steps below based on the programming language of your application.

In your .NET project, add the latest Microsoft.WindowsAppSDK experimental NuGet package, which includes Windows ML as a dependency. Make sure you install the latest experimental version, as the release versions don't contain Windows ML yet:

dotnet add package Microsoft.WindowsAppSDK --prerelease

Alternatively, you can reference both Windows ML and WindowsAppSDK Runtime packages directly:

dotnet add package Microsoft.WindowsAppSDK.ML --prerelease
dotnet add package Microsoft.WindowsAppSDK.Runtime --prerelease

And then import the namespaces in your code:

using Microsoft.ML.OnnxRuntime;
using Microsoft.Windows.AI.MachineLearning;

Step 2: Download and register the latest EPs

Then, we will use Windows ML to ensure that the latest execution providers (EPs) are available on the device and registered with the ONNX Runtime.

// First we create a new instance of EnvironmentCreationOptions
EnvironmentCreationOptions envOptions = new()
{
    logId = "WinMLDemo", // Use an ID of your own choice
    logLevel = OrtLoggingLevel.ORT_LOGGING_LEVEL_ERROR
};

// And then use that to create the ORT environment
using var ortEnv = OrtEnv.CreateInstanceWithOptions(ref envOptions);

// Get the default ExecutionProviderCatalog
var catalog = ExecutionProviderCatalog.GetDefault();

// Ensure and register all compatible execution providers with ONNX Runtime
// This downloads any necessary components and registers them
await catalog.EnsureAndRegisterAllAsync();

Step 3: Configure the execution providers

The ONNX Runtime allow apps to configure execution providers (EPs) based on Device Policies, or explicitly, which allows for more control over provider options and which devices should be used.

We recommend starting with explicit selection of EPs so that you can have more predictibility in the results. After you have this working, you can experiment with using Device Policies to select execution providers in a natural, outcome-oriented way.

To explicitly select one or more EPs, you will use the GetEpDevices function on OrtApi, which enables enumerating through all available devices. SessionOptionsAppendExecutionProvider_V2 can then be used to explicitly append specific devices and provide custom provider options to the desired EP.

using Microsoft.ML.OnnxRuntime;

// Get all available EP devices from the environment
var epDevices = ortEnv.GetEpDevices();

// Accumulate devices by EpName
// Passing all devices for a given EP in a single call allows the execution provider
// to select the best configuration or combination of devices, rather than being limited
// to a single device. This enables optimal use of available hardware if supported by the EP.
var epDeviceMap = epDevices
    .GroupBy(device => device.EpName)
    .ToDictionary(g => g.Key, g => g.ToList());

// For demonstration, list all found EPs, vendors, and device types
foreach (var epGroup in epDeviceMap)
{
    var epName = epGroup.Key;
    var devices = epGroup.Value;

    Console.WriteLine($"Execution Provider: {epName}");
    foreach (var device in devices)
    {
        string deviceType = GetDeviceTypeString(device.HardwareDevice.Type);
        Console.WriteLine($" | Vendor: {device.EpVendor,-16} | Device Type: {deviceType,-8}");
    }
}

// Configure and append each EP type only once, with all its devices
var sessionOptions = new SessionOptions();
foreach ((var epName, var devices) in epDeviceMap)
{
    Dictionary<string, string> epOptions = new();
    switch (epName)
    {
        case "VitisAIExecutionProvider":
            // Demonstrating passing no options for VitisAI
            sessionOptions.AppendExecutionProvider(ortEnv, devices, epOptions);
            Console.WriteLine($"Successfully added {epName} EP");
            break;

        case "OpenVINOExecutionProvider":
            // Configure threading for OpenVINO EP, pick the first device found
            epOptions["num_of_threads"] = "4";
            sessionOptions.AppendExecutionProvider(ortEnv, [devices.First()], epOptions);
            Console.WriteLine($"Successfully added {epName} EP (first device only)");
            break;

        case "QNNExecutionProvider":
            // Configure performance mode for QNN EP
            epOptions["htp_performance_mode"] = "high_performance";
            sessionOptions.AppendExecutionProvider(ortEnv, devices, epOptions);
            Console.WriteLine($"Successfully added {epName} EP");
            break;

        default:
            Console.WriteLine($"Skipping EP: {epName}");
            break;
    }
}

For more details see the ONNX Runtime OrtApi documentation. To learn about the versioning strategy around EPs, see the versioning of execution providers documentation.

Step 4: Compile the model

ONNX models must be compiled into an optimized representation that can be executed efficiently on the device's underlying hardware. The excecution provider you configured in step 3 helps perform this transformation.

As of the 1.22 release, the ONNX Runtime has introduced new APIs to better encapsulate the compilation steps. More details are available in the ONNX Runtime compile documentation (see OrtCompileApi struct).

Note

Compilation can take several minutes to complete. So that any UI remains responsive, consider doing this as a background operation in your application.

// Prepare compilation options using our session we configured in step 3
OrtModelCompilationOptions compileOptions = new(sessionOptions);
compileOptions.SetInputModelPath(modelPath);
compileOptions.SetOutputModelPath(compiledModelPath);

// Compile the model
compileOptions.CompileModel();

Step 5: Run model inference

Now that the model is compiled for the local hardware on the device, we can create an inference session and inference the model.

// Create inference session using compiled model
using InferenceSession session = new(compiledModelPath, sessionOptions);

Step 6: Distributing your app

Before distributing your app, C# and C++ developers will need to take additional steps to ensure the Windows ML Runtime is installed on your users' devices when your app is installed. See the distributing your app page to learn more.

Model compilation

An ONNX model is represented as a graph, where nodes correspond to operators (such as matrix multiplications, convolutions, and other mathematical processes), and edges define the flow of data between them.

This graph-based structure allows for efficient execution and optimization by allowing transformations such as operator fusion (that is, combining multiple related operations into a single optimized operation) and graph pruning (that is, removing redundant nodes from the graph).

Model compilation refers to the process of transforming an ONNX model with the aid of an execution provider (EP) into an optimized representation that can be executed efficiently on the device's underlying hardware.

Designing for compilation

Here are some ideas for handling compilation in your application.

  • Compilation performance. Compilation can take several minutes to complete. So that any UI remains responsive, consider doing this as a background operation in your application.
  • User interface updates. Consider letting your users know whether your application is doing any compilation work, and notifying them when it's complete.
  • Graceful fallback mechanisms. If there is an issue with loading a compiled model, then try to capture diagnostic data for the failure, and have your application fall back to using the original model if possible so that your application's related AI functionality can still be used.

Using Device Policies for execution provider selection

In addition to explicitly selecting EPs, you can also use Device Policies, which are a natural, outcome-oriented way for you to specify how you want your AI workload to be run. To do this, you'll use the SessionOptions.SetEpSelectionPolicy function on the OrtApi, passing in the OrtExecutionProviderDevicePolicy values. There are a variety of values you can use for automatic selection, like MAX_PERFORMANCE, PREFER_NPU, MAX_EFFICIENCY, and more. See the ONNX OrtExecutionProviderDevicePolicy docs for other values you can use.

// Configure the session to select an EP and device for MAX_EFFICIENCY which typically
// will choose an NPU if available with a CPU fallback.
var sessionOptions = new SessionOptions();
sessionOptions.SetEpSelectionPolicy(ExecutionProviderDevicePolicy.MAX_EFFICIENCY);

Providing feedback about Windows ML

We would love to hear your feedback about using Windows ML! If you run into any issues, please use the Feedback Hub app on Windows to report your issue.

Feedback should be submitted under the Developer Platform -> Windows Machine Learning category.

See also