Unpacking the Windows Copilot Runtime for Local AI Development

The promise of local AI development on Windows, particularly leveraging Neural Processing Units (NPUs), has been a topic of much discussion. Microsoft’s GitHub Copilot has demonstrated some of AI’s benefits for developers. However, the specifics surrounding the Windows Copilot Runtime remain somewhat unclear, especially concerning its scope and practical applications.

:scroll: A Look Back

Microsoft officially unveiled the Windows Copilot Runtime at Build 2024 on May 21, 2024. The introduction of this runtime followed CEO Satya Nadella’s earlier remarks during the Copilot+ PC launch.

Nadella stated, in reference to the advancements tied to Copilot+ PCs and their NPUs, “Every developer can take advantage of it… What Win32 meant for the graphical user interface, the new Windows Copilot Runtime that we’re announcing today will be to AI.”

This comparison requires closer examination.

Win32 provided a standard set of APIs for developers creating native 32-bit Windows applications, starting with Windows NT in 1993. It built upon the original Windows APIs (later known as Win16), which truly ushered in the GUI era. One could even argue that Visual Basic offered an even more accessible entry point to GUI programming than either Win16 or Win32.

Furthermore, the Windows Copilot Runtime’s role in AI differs significantly from Win32’s impact on GUIs. The Windows Copilot Runtime focuses exclusively on AI operations running locally on NPU-equipped Copilot+ devices. This represents only a fraction of the broader AI landscape. Most AI processes reside in the cloud, a realm where the Windows Copilot Runtime currently offers no direct assistance to developers.

:interrobang: Deciphering the Windows Copilot Runtime

Understanding the Windows Copilot Runtime (WCR) requires clarifying its purpose. The term “runtime” can be misleading in this context. Applications don’t execute on top of the WCR in the same way .NET applications run on the .NET runtime. Microsoft’s wording can be confusing.

Windows lead Pavan Davuluri stated at the Copilot+ PC launch: “A core element of our re-architecture of Windows… is the Windows Copilot Runtime… It is a powerful AI capability woven into eve…”

Using ONNX Runtime DirectML for NPU Acceleration

The most effective way to harness the power of NPUs for local AI development involves utilizing ONNX Runtime with the DirectML execution provider. This allows direct execution of machine learning models on the NPU, bypassing the complexities of the Windows Copilot Runtime and providing a more direct path to hardware acceleration.

Step 1: Install the ONNX Runtime package with DirectML support.

pip install onnxruntime-directml

Step 2: Load your ONNX model using ONNX Runtime.

import onnxruntime

# Specify the path to your ONNX model
model_path = "your_model.onnx"

# Create an inference session using the DirectML execution provider
session = onnxruntime.InferenceSession(model_path, providers=['DmlExecutionProvider'])

Step 3: Prepare your input data.

# Example: Assuming your model expects a NumPy array as input
import numpy as np
input_data = np.random.rand(1, 3, 224, 224).astype(np.float32) # Example input shape

# Get the input name
input_name = session.get_inputs()[0].name

# Create the input feed
input_feed = {input_name: input_data}

Step 4: Run the inference.

# Run the inference
output = session.run(None, input_feed)

# Print the output
print(output)

Exploring the Windows Copilot Runtime APIs (If Available)

If and when the Windows Copilot Runtime APIs become readily accessible and documented, developers can investigate their specific functionalities for local AI tasks.

Step 1: Obtain the necessary SDK and documentation for the Windows Copilot Runtime. This likely involves registering as a developer and downloading the appropriate packages from Microsoft.

Step 2: Import the relevant namespaces or libraries into your project. The specific import statements will depend on the programming language (e.g., C#, Python) and the structure of the WCR SDK.

Step 3: Discover the core APIs. Focus on functions related to model loading, execution on the NPU, and data handling. Consult the documentation for detailed explanations and examples.

Step 4: Implement your AI logic using the WCR APIs. This will likely involve loading a pre-trained AI model, feeding it input data, and processing the output.

Leveraging Existing Machine Learning Frameworks

Existing machine learning frameworks like TensorFlow and PyTorch can be configured to utilize NPUs through specific hardware acceleration backends.

Step 1: Install TensorFlow or PyTorch along with the appropriate NPU-compatible backend. This might involve installing a specific version of the framework or a separate driver or library. For example, for some NPUs, you might need to install a dedicated TensorFlow or PyTorch plugin.

Step 2: Configure your framework to use the NPU. This often involves setting environment variables or specifying the device to use within your code. Consult the documentation for your specific NPU and framework.

Step 3: Load your AI model and run inference. The framework should automatically utilize the NPU for accelerated computation.


While the Windows Copilot Runtime holds potential, direct utilization of ONNX Runtime or existing ML frameworks with NPU acceleration backends offers a more immediate and possibly more flexible route for local AI development.