How-to setup Speech SDK with MAS (AEC) in Unity

nk 5 Reputation points
2025-07-15T08:22:15.72+00:00

Hello everyone,

I am trying to implement Acoustic Echo Cancellation (AEC) in a Unity project using the Azure Speech SDK and the Microsoft Audio Stack (MAS), but I cannot get it to work correctly. The speech recognizer continues to pick up and transcribe audio from my speakers.

Goal: My goal is to configure the Speech SDK to perform speech recognition on a microphone input while simultaneously ignoring audio being played out of the system's speakers. Essentially, if someone is speaking through the speakers, the speech recognizer should not transcribe that audio, but it should still be able to recognize and transcribe a user speaking into the microphone.

Setup:

  1. Engine: Unity 2022.3.x
  2. SDK: Azure Speech SDK for C#
  3. Feature: Azure Speech SDK with Microsoft Audio Stack (MAS) (enabled via AudioProcessingOptions)
  4. Hardware:
    • Output: A 5.1 speaker system
    • Input: A standard microphone placed in front of the user.

Problem Details: I followed the documentation (https://learn.microsoft.com/en-us/azure/ai-services/speech-service/audio-processing-speech-sdk?tabs=csharp) to enable the Microsoft Audio Stack. My expectation was that MAS would use the speaker output as a reference signal to cancel it out from the microphone's input, thereby only recognizing the user's speech.

However, when voice is being played over the speakers, the SpeechRecognizer transcribes everything the speaker says. This indicates that the AEC is not functioning.

My hypothesis is that MAS does not have the correct reference audio signal for the echo cancellation. The documentation mentions that MAS uses the "last channel of the input device" as the reference channel, but I'm unsure how to configure my system to correctly route the speaker's voice to this channel or how to verify if this is the root cause.

What I've Tried:

  1. Minimal Sample Project: I created a minimal Unity project to isolate the issue from our main, more complex project.
  2. Dependencies: I set up the dependencies by manually including the DLLs for Microsoft.CognitiveServices.Speech and Microsoft.CognitiveServices.Speech.Extension.MAS, and using NuGetForUnity for the Azure.Core dependency.
  3. Code Implementation: I am using SpeechRecognizer with AudioConfig.FromDefaultMicrophoneInput(audioProcessingOptions) to setup MAS

Code

The complete unity project is to big to paste here, but this is the only script that has any functionality:

using UnityEngine;
using UnityEngine.UI;
using Microsoft.CognitiveServices.Speech;
using Microsoft.CognitiveServices.Speech.Audio;
using TMPro;

#if PLATFORM_ANDROID
using UnityEngine.Android;
#endif

public class AzureSpeechRecognizer : MonoBehaviour
{
    
    private string subscriptionKey = "YOUR_SUBSCRIPTION_KEY_HERE"; // Replace with your Azure Speech subscription key

    private string region = "YOUR_REGION";

    // Public fields for the Unity Inspector
    [Header("UI Elements")]
    [Tooltip("The button to start speech recognition.")]
    public Button startRecognitionButton;

    [Tooltip("The UI Text element to display the recognized text.")]
    public TextMeshProUGUI outputText;

    public AudioSource speaker;
    // Internal objects for speech recognition
    private SpeechRecognizer recognizer;
    private SpeechConfig speechConfig;
    private AudioConfig audioConfig;
    private bool isRecognizing = false;
    private object threadLocker = new object();
    private string message;

    void Start()
    {
        // --- Initialization ---
        if (outputText == null)
        {
            Debug.LogError("Output Text field is not assigned in the inspector.");
            return;
        }

        if (startRecognitionButton == null)
        {
            Debug.LogError("Start Recognition Button is not assigned in the inspector.");
            return;
        }

        // Add a listener to the button to call the StartRecognition method when clicked
        startRecognitionButton.onClick.AddListener(StartRecognition);

        // --- Permission Handling for Android ---
#if PLATFORM_ANDROID
        if (!Permission.HasUserAuthorizedPermission(Permission.Microphone))
        {
            Permission.RequestUserPermission(Permission.Microphone);
        }
#endif

        // --- Speech SDK Configuration ---
        // Creates an instance of a speech config with specified subscription key and service region.
        speechConfig = SpeechConfig.FromSubscription(subscriptionKey, region);

        speaker.PlayDelayed(5);
    }

    /// <summary>
    /// Called when the start recognition button is clicked.
    /// </summary>
    public async void StartRecognition()
    {
        if (isRecognizing)
        {
            // If already recognizing, stop the recognition
            await recognizer.StopContinuousRecognitionAsync();
            isRecognizing = false;
            UpdateUI("Recognition stopped.");
            return;
        }

        // --- Audio Configuration ---
        // Creates an audio configuration that will use the default microphone.
        AudioProcessingOptions audioProcessingOptions = AudioProcessingOptions.Create(
            AudioProcessingConstants.AUDIO_INPUT_PROCESSING_ENABLE_DEFAULT,
            PresetMicrophoneArrayGeometry.Linear2,
            SpeakerReferenceChannel.LastChannel);


        foreach (var device in Microphone.devices)
        {
            Debug.Log("Name " + device);
        }
        audioConfig = AudioConfig.FromDefaultMicrophoneInput(audioProcessingOptions);

        // --- Speech Recognizer Creation ---
        // Creates a speech recognizer from the speech and audio configurations.
        recognizer = new SpeechRecognizer(speechConfig, audioConfig);

        // --- Event Subscriptions ---
        // Subscribes to events.
        recognizer.Recognizing += (s, e) =>
        {
            lock (threadLocker)
            {
                message = $"RECOGNIZING: Text={e.Result.Text}";
            }
        };

        recognizer.Recognized += (s, e) =>
        {
            if (e.Result.Reason == ResultReason.RecognizedSpeech)
            {
                lock (threadLocker)
                {
                    message = $"RECOGNIZED: Text={e.Result.Text}";
                }
            }
            else if (e.Result.Reason == ResultReason.NoMatch)
            {
                lock (threadLocker)
                {
                    message = "NOMATCH: Speech could not be recognized.";
                }
            }
        };

        recognizer.Canceled += (s, e) =>
        {
            lock (threadLocker)
            {
                message = $"CANCELED: Reason={e.Reason}";
            }

            if (e.Reason == CancellationReason.Error)
            {
                Debug.LogError($"CANCELED: ErrorDetails={e.ErrorDetails}");
                Debug.LogError("CANCELED: Did you set the speech resource key and region values?");
            }
        };

        recognizer.SessionStarted += (s, e) =>
        {
            Debug.Log("Session started event.");
        };

        recognizer.SessionStopped += (s, e) =>
        {
            Debug.Log("Session stopped event.");
            isRecognizing = false;
        };

        recognizer.SpeechStartDetected += (s, e) =>
        {
            Debug.Log("Speech Started");
        };

        recognizer.SpeechEndDetected += (s, e) =>
        {
            Debug.Log("Speech Ended");
        };

        // --- Start Recognition ---
        // Starts continuous recognition.
        // Uses StopContinuousRecognitionAsync() to stop recognition.
        await recognizer.StartContinuousRecognitionAsync().ConfigureAwait(false);
        isRecognizing = true;
        UpdateUI("Say something...");


    }

    void Update()
    {
        lock (threadLocker)
        {
            if (outputText != null)
            {
                outputText.text = message;
            }
        }
    }

    /// <summary>
    /// Updates the UI text on the main thread.
    /// </summary>
    /// <param name="text">The text to display.</param>
    private void UpdateUI(string text)
    {
        lock(threadLocker)
        {
            message = text;
        }
    }

    void OnDestroy()
    {
        // --- Cleanup ---
        if (recognizer != null)
        {
            recognizer.Dispose();
        }
    }
}

There is a AudioSource speaker in the scene which plays the speakers voice over the speakers. Also, a button startRecognitionButton for starting the recording and a TextMeshPro outputText for showing the transcribed result.

Question: Has anyone successfully implemented AEC with the Azure Speech SDK and MAS in Unity? Is there a specific configuration required for the audio output or the system's microphone channels to provide the correct reference signal for echo cancellation?

Any guidance or examples would be greatly appreciated. Thank you!

Azure AI Speech
Azure AI Speech
An Azure service that integrates speech processing into apps and services.
{count} votes

1 answer

Sort by: Most helpful
  1. Manas Mohanty 8,150 Reputation points Microsoft External Staff Moderator
    2025-08-05T07:36:17.16+00:00

    Hey nk

    Good day. Sorry for the delay in reverting back to you.

    I am not yet fortunate to replicate in unity studio yet (constraints on installation, pending approval from org side)

    But got below update from product group on unity perspective.

    "

    MAS was not supported in unity package for now, Below is a possible walkaround from expert in the area

    Work around suggested from product group

    https://www.nuget.org/packages/Microsoft.CognitiveServices.Speech.Extension.MAS (same version as the SDK Unity package) and manually copy MAS libraries (from runtimes\win-x64\native) and the AEC model (from contentFiles\any\any\models) to the same folder on disk where the other SDK (library) files are located in the Unity project.

     

    Alternatively they could try using https://github.com/GlitchEnzo/NuGetForUnity to install the Speech SDK using nuget packages instead of the unitypackage (not tested officially)

    "

    Could you test above the unity packages and let us know your experience.

    Thank you

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.