How-to setup Speech SDK with MAS (AEC) in Unity

Question

How-to setup Speech SDK with MAS (AEC) in Unity

nk 5

Hello everyone,

I am trying to implement Acoustic Echo Cancellation (AEC) in a Unity project using the Azure Speech SDK and the Microsoft Audio Stack (MAS), but I cannot get it to work correctly. The speech recognizer continues to pick up and transcribe audio from my speakers.

Goal: My goal is to configure the Speech SDK to perform speech recognition on a microphone input while simultaneously ignoring audio being played out of the system's speakers. Essentially, if someone is speaking through the speakers, the speech recognizer should not transcribe that audio, but it should still be able to recognize and transcribe a user speaking into the microphone.

Setup:

Engine: Unity 2022.3.x
SDK: Azure Speech SDK for C#
Feature: Azure Speech SDK with Microsoft Audio Stack (MAS) (enabled via AudioProcessingOptions)
Hardware:
- Output: A 5.1 speaker system
- Input: A standard microphone placed in front of the user.

Problem Details: I followed the documentation (https://learn.microsoft.com/en-us/azure/ai-services/speech-service/audio-processing-speech-sdk?tabs=csharp) to enable the Microsoft Audio Stack. My expectation was that MAS would use the speaker output as a reference signal to cancel it out from the microphone's input, thereby only recognizing the user's speech.

However, when voice is being played over the speakers, the SpeechRecognizer transcribes everything the speaker says. This indicates that the AEC is not functioning.

My hypothesis is that MAS does not have the correct reference audio signal for the echo cancellation. The documentation mentions that MAS uses the "last channel of the input device" as the reference channel, but I'm unsure how to configure my system to correctly route the speaker's voice to this channel or how to verify if this is the root cause.

What I've Tried:

Minimal Sample Project: I created a minimal Unity project to isolate the issue from our main, more complex project.
Dependencies: I set up the dependencies by manually including the DLLs for Microsoft.CognitiveServices.Speech and Microsoft.CognitiveServices.Speech.Extension.MAS, and using NuGetForUnity for the Azure.Core dependency.
Code Implementation: I am using SpeechRecognizer with AudioConfig.FromDefaultMicrophoneInput(audioProcessingOptions) to setup MAS

Code

The complete unity project is to big to paste here, but this is the only script that has any functionality:

using UnityEngine;
using UnityEngine.UI;
using Microsoft.CognitiveServices.Speech;
using Microsoft.CognitiveServices.Speech.Audio;
using TMPro;

#if PLATFORM_ANDROID
using UnityEngine.Android;
#endif

public class AzureSpeechRecognizer : MonoBehaviour
{
    
    private string subscriptionKey = "YOUR_SUBSCRIPTION_KEY_HERE"; // Replace with your Azure Speech subscription key

    private string region = "YOUR_REGION";

    // Public fields for the Unity Inspector
    [Header("UI Elements")]
    [Tooltip("The button to start speech recognition.")]
    public Button startRecognitionButton;

    [Tooltip("The UI Text element to display the recognized text.")]
    public TextMeshProUGUI outputText;

    public AudioSource speaker;
    // Internal objects for speech recognition
    private SpeechRecognizer recognizer;
    private SpeechConfig speechConfig;
    private AudioConfig audioConfig;
    private bool isRecognizing = false;
    private object threadLocker = new object();
    private string message;

    void Start()
    {
        // --- Initialization ---
        if (outputText == null)
        {
            Debug.LogError("Output Text field is not assigned in the inspector.");
            return;
        }

        if (startRecognitionButton == null)
        {
            Debug.LogError("Start Recognition Button is not assigned in the inspector.");
            return;
        }

        // Add a listener to the button to call the StartRecognition method when clicked
        startRecognitionButton.onClick.AddListener(StartRecognition);

        // --- Permission Handling for Android ---
#if PLATFORM_ANDROID
        if (!Permission.HasUserAuthorizedPermission(Permission.Microphone))
        {
            Permission.RequestUserPermission(Permission.Microphone);
        }
#endif

        // --- Speech SDK Configuration ---
        // Creates an instance of a speech config with specified subscription key and service region.
        speechConfig = SpeechConfig.FromSubscription(subscriptionKey, region);

        speaker.PlayDelayed(5);
    }

    /// <summary>
    /// Called when the start recognition button is clicked.
    /// </summary>
    public async void StartRecognition()
    {
        if (isRecognizing)
        {
            // If already recognizing, stop the recognition
            await recognizer.StopContinuousRecognitionAsync();
            isRecognizing = false;
            UpdateUI("Recognition stopped.");
            return;
        }

        // --- Audio Configuration ---
        // Creates an audio configuration that will use the default microphone.
        AudioProcessingOptions audioProcessingOptions = AudioProcessingOptions.Create(
            AudioProcessingConstants.AUDIO_INPUT_PROCESSING_ENABLE_DEFAULT,
            PresetMicrophoneArrayGeometry.Linear2,
            SpeakerReferenceChannel.LastChannel);


        foreach (var device in Microphone.devices)
        {
            Debug.Log("Name " + device);
        }
        audioConfig = AudioConfig.FromDefaultMicrophoneInput(audioProcessingOptions);

        // --- Speech Recognizer Creation ---
        // Creates a speech recognizer from the speech and audio configurations.
        recognizer = new SpeechRecognizer(speechConfig, audioConfig);

        // --- Event Subscriptions ---
        // Subscribes to events.
        recognizer.Recognizing += (s, e) =>
        {
            lock (threadLocker)
            {
                message = $"RECOGNIZING: Text={e.Result.Text}";
            }
        };

        recognizer.Recognized += (s, e) =>
        {
            if (e.Result.Reason == ResultReason.RecognizedSpeech)
            {
                lock (threadLocker)
                {
                    message = $"RECOGNIZED: Text={e.Result.Text}";
                }
            }
            else if (e.Result.Reason == ResultReason.NoMatch)
            {
                lock (threadLocker)
                {
                    message = "NOMATCH: Speech could not be recognized.";
                }
            }
        };

        recognizer.Canceled += (s, e) =>
        {
            lock (threadLocker)
            {
                message = $"CANCELED: Reason={e.Reason}";
            }

            if (e.Reason == CancellationReason.Error)
            {
                Debug.LogError($"CANCELED: ErrorDetails={e.ErrorDetails}");
                Debug.LogError("CANCELED: Did you set the speech resource key and region values?");
            }
        };

        recognizer.SessionStarted += (s, e) =>
        {
            Debug.Log("Session started event.");
        };

        recognizer.SessionStopped += (s, e) =>
        {
            Debug.Log("Session stopped event.");
            isRecognizing = false;
        };

        recognizer.SpeechStartDetected += (s, e) =>
        {
            Debug.Log("Speech Started");
        };

        recognizer.SpeechEndDetected += (s, e) =>
        {
            Debug.Log("Speech Ended");
        };

        // --- Start Recognition ---
        // Starts continuous recognition.
        // Uses StopContinuousRecognitionAsync() to stop recognition.
        await recognizer.StartContinuousRecognitionAsync().ConfigureAwait(false);
        isRecognizing = true;
        UpdateUI("Say something...");


    }

    void Update()
    {
        lock (threadLocker)
        {
            if (outputText != null)
            {
                outputText.text = message;
            }
        }
    }

    /// <summary>
    /// Updates the UI text on the main thread.
    /// </summary>
    /// <param name="text">The text to display.</param>
    private void UpdateUI(string text)
    {
        lock(threadLocker)
        {
            message = text;
        }
    }

    void OnDestroy()
    {
        // --- Cleanup ---
        if (recognizer != null)
        {
            recognizer.Dispose();
        }
    }
}

There is a AudioSource speaker in the scene which plays the speakers voice over the speakers. Also, a button startRecognitionButton for starting the recording and a TextMeshPro outputText for showing the transcribed result.

Question: Has anyone successfully implemented AEC with the Azure Speech SDK and MAS in Unity? Is there a specific configuration required for the audio output or the system's microphone channels to provide the correct reference signal for echo cancellation?

Any guidance or examples would be greatly appreciated. Thank you!

Manas Mohanty 8,150 Reputation points Microsoft External Staff Moderator

2025-07-24T07:43:17.6766667+00:00

Hi nk

Sorry for the late response.

It seems you are trying to achieve" Acoustic Echo Cancellation (AEC) in a Unity project using the Azure Speech SDK and the Microsoft Audio Stack (MAS)" and facing some issue.

I shall update you once I have gone in details on using AudioProcessingOptions and provided code/MAS document and reviewed with team internally.

Thank you for providing the investigation time.
Manas Mohanty 8,150 Reputation points Microsoft External Staff Moderator

2025-07-29T02:12:09.6466667+00:00
Hi nk

Thank you for providing the investigation time. Went through code today.

Analysis

You seem to be taking inputs from android, while it is tested and supported for windows and Linux.

Microsoft Audio Stack requires the reference channel (also known as loopback channel) to perform echo cancellation. The source of the reference channel varies by platform:

Windows - The reference channel is automatically gathered by the Speech SDK if the SpeakerReferenceChannel::LastChannel option is provided when creating AudioProcessingOptions.

Linux - ALSA (Advanced Linux Sound Architecture) must be configured to provide the reference audio stream as the last channel for the audio input device used. ALSA is configured in addition to providing the SpeakerReferenceChannel::LastChannel option when creating AudioProcessingOptions.

Please refer the minimum requirement of AEC here

https://learn.microsoft.com/en-us/azure/ai-services/speech-service/audio-processing-overview#minimum-requirements-to-use-microsoft-audio-stack

Optimization

You might need to use native AEC manager Unity’s AndroidJavaObject bridge.

AudioManager audioManager = (AudioManager)context.getSystemService(Context.AUDIO_SERVICE); audioManager.setParameters("noise_suppression=on"); audioManager.setParameters("echo_cancellation=on"); AudioRecord record = new AudioRecord(MediaRecorder.AudioSource.VOICE_COMMUNICATION, ...);

Other optimization can drawn from this reference doc - https://learn.microsoft.com/en-us/dotnet/api/microsoft.cognitiveservices.speech.audio?view=azure-dotnet

Hope it helps.

Thank you
nk 5 Reputation points

2025-07-29T07:48:25.5266667+00:00

Hi @Manas Mohanty Thank you very much for investing your time.

I can see why you think I'm developing on android since I have these directives in my code, but these are just leftovers from an older project. I am testing this on a windows platform. And as you correctly mentioned, I think the issue is with the correct routing of the reference channel. How can I assure that my last channel of my input microphone is actually the audio data sent to my speakers?

Thanks again!
Manas Mohanty 8,150 Reputation points Microsoft External Staff Moderator

2025-07-30T01:57:06.6133333+00:00

Hi nk

Thank you for confirming that you are working from windows only.

I will forward the observation to product group anyway while I am doing a full replication with above setup.
Manas Mohanty 8,150 Reputation points Microsoft External Staff Moderator

2025-08-01T03:05:09.3666667+00:00

Hi nk

We got confirmation from product group, the shared C# code work if tested without unity in windows.

Ask

Could you share details Speech SDK version and Microphone configuration.

Could you also test the code in latest SDK version without unity and let us know your observation.

Thank you.
nk 5 Reputation points

2025-08-01T07:26:00.2233333+00:00

Hi @Manas Mohanty

Thats interesting...we basically followed this tutorial to setup Speech SDK in Unity: https://github.com/Azure-Samples/cognitive-services-speech-sdk/tree/master/samples/csharp/unity/speechrecognizer So we are using the newest Speech SDK (1.44) with the necessary dependencies (Azure.Core and Extensions.MAS dependency). Microphone configuration is just an external microphone (from Samson). We set this via Windows Sound Settings to default input mic and then, as shown in the code above, use the "FromDefaultInputMic" function to set this microphone as the input for the Speech Recognizer. Unity Version is 2022.3.x. We're you able to replicate our issue?
Manas Mohanty 8,150 Reputation points Microsoft External Staff Moderator

2025-08-06T12:15:51.9033333+00:00

Hey nk

Could you share your experience on this unity package - https://github.com/GlitchEnzo/NuGetForUnity

Thank you
nk 5 Reputation points

2025-08-06T12:21:52.5033333+00:00

Hey @Manas Mohanty

Thanks for your help! The work around suggested by the product group sounds good - I am already using NuGetForUnity in order to install the "Azure.Core" dependency, because in the newer "Speech SDK" unity packages it is not included anymore (see README here: https://github.com/Azure-Samples/cognitive-services-speech-sdk/tree/master/samples/csharp/unity/speechrecognizer#import-the-speech-sdk-plugin-for-unity-in-the-sample). I also downloaded the MAS extension as suggested, but I think I forgot to correctly copy the "model" folder. I will try that and comeback to you.

NuGetForUnity should work for the "Azure.Core" dependency, but to download the "Extension.MAS" I had to manually extract the .dll files and import it into unity.

Thanks!

1 answer

Your answer

Manas Mohanty 8,150 Reputation points Microsoft External Staff Moderator

2025-07-24T07:43:17.6766667+00:00

Hi nk

Sorry for the late response.

It seems you are trying to achieve" Acoustic Echo Cancellation (AEC) in a Unity project using the Azure Speech SDK and the Microsoft Audio Stack (MAS)" and facing some issue.

I shall update you once I have gone in details on using AudioProcessingOptions and provided code/MAS document and reviewed with team internally.

Thank you for providing the investigation time.
Manas Mohanty 8,150 Reputation points Microsoft External Staff Moderator

2025-07-29T02:12:09.6466667+00:00

Hi nk

Thank you for providing the investigation time. Went through code today.

Analysis

You seem to be taking inputs from android, while it is tested and supported for windows and Linux.

Microsoft Audio Stack requires the reference channel (also known as loopback channel) to perform echo cancellation. The source of the reference channel varies by platform:

Windows - The reference channel is automatically gathered by the Speech SDK if the SpeakerReferenceChannel::LastChannel option is provided when creating AudioProcessingOptions.

Linux - ALSA (Advanced Linux Sound Architecture) must be configured to provide the reference audio stream as the last channel for the audio input device used. ALSA is configured in addition to providing the SpeakerReferenceChannel::LastChannel option when creating AudioProcessingOptions.

Please refer the minimum requirement of AEC here

https://learn.microsoft.com/en-us/azure/ai-services/speech-service/audio-processing-overview#minimum-requirements-to-use-microsoft-audio-stack

Optimization

You might need to use native AEC manager Unity’s AndroidJavaObject bridge.

AudioManager audioManager = (AudioManager)context.getSystemService(Context.AUDIO_SERVICE); audioManager.setParameters("noise_suppression=on"); audioManager.setParameters("echo_cancellation=on"); AudioRecord record = new AudioRecord(MediaRecorder.AudioSource.VOICE_COMMUNICATION, ...);

Other optimization can drawn from this reference doc - https://learn.microsoft.com/en-us/dotnet/api/microsoft.cognitiveservices.speech.audio?view=azure-dotnet

Hope it helps.

Thank you
nk 5 Reputation points

2025-07-29T07:48:25.5266667+00:00

Hi @Manas Mohanty Thank you very much for investing your time.

I can see why you think I'm developing on android since I have these directives in my code, but these are just leftovers from an older project. I am testing this on a windows platform. And as you correctly mentioned, I think the issue is with the correct routing of the reference channel. How can I assure that my last channel of my input microphone is actually the audio data sent to my speakers?

Thanks again!
Manas Mohanty 8,150 Reputation points Microsoft External Staff Moderator

2025-07-30T01:57:06.6133333+00:00

Hi nk

Thank you for confirming that you are working from windows only.

I will forward the observation to product group anyway while I am doing a full replication with above setup.
Manas Mohanty 8,150 Reputation points Microsoft External Staff Moderator

2025-08-01T03:05:09.3666667+00:00

Hi nk

We got confirmation from product group, the shared C# code work if tested without unity in windows.

Ask

Could you share details Speech SDK version and Microphone configuration.

Could you also test the code in latest SDK version without unity and let us know your observation.

Thank you.
nk 5 Reputation points

2025-08-01T07:26:00.2233333+00:00

Hi @Manas Mohanty

Thats interesting...we basically followed this tutorial to setup Speech SDK in Unity: https://github.com/Azure-Samples/cognitive-services-speech-sdk/tree/master/samples/csharp/unity/speechrecognizer So we are using the newest Speech SDK (1.44) with the necessary dependencies (Azure.Core and Extensions.MAS dependency). Microphone configuration is just an external microphone (from Samson). We set this via Windows Sound Settings to default input mic and then, as shown in the code above, use the "FromDefaultInputMic" function to set this microphone as the input for the Speech Recognizer. Unity Version is 2022.3.x. We're you able to replicate our issue?
Manas Mohanty 8,150 Reputation points Microsoft External Staff Moderator

2025-08-06T12:15:51.9033333+00:00

Hey nk

Could you share your experience on this unity package - https://github.com/GlitchEnzo/NuGetForUnity

Thank you
nk 5 Reputation points

2025-08-06T12:21:52.5033333+00:00

Hey @Manas Mohanty

Thanks for your help! The work around suggested by the product group sounds good - I am already using NuGetForUnity in order to install the "Azure.Core" dependency, because in the newer "Speech SDK" unity packages it is not included anymore (see README here: https://github.com/Azure-Samples/cognitive-services-speech-sdk/tree/master/samples/csharp/unity/speechrecognizer#import-the-speech-sdk-plugin-for-unity-in-the-sample). I also downloaded the MAS extension as suggested, but I think I forgot to correctly copy the "model" folder. I will try that and comeback to you.

NuGetForUnity should work for the "Azure.Core" dependency, but to download the "Extension.MAS" I had to manually extract the .dll files and import it into unity.

Thanks!

Answer 1

Hey nk

Good day. Sorry for the delay in reverting back to you.

I am not yet fortunate to replicate in unity studio yet (constraints on installation, pending approval from org side)

But got below update from product group on unity perspective.

"

MAS was not supported in unity package for now, Below is a possible walkaround from expert in the area

Work around suggested from product group

https://www.nuget.org/packages/Microsoft.CognitiveServices.Speech.Extension.MAS (same version as the SDK Unity package) and manually copy MAS libraries (from runtimes\win-x64\native) and the AEC model (from contentFiles\any\any\models) to the same folder on disk where the other SDK (library) files are located in the Unity project.

Alternatively they could try using https://github.com/GlitchEnzo/NuGetForUnity to install the Speech SDK using nuget packages instead of the unitypackage (not tested officially)

"

Could you test above the unity packages and let us know your experience.

Thank you

Share via

How-to setup Speech SDK with MAS (AEC) in Unity

1 answer

Your answer