Unexpectedly high TTS character count in Azure Speech Service during live app test

Lauren Van Niekerk 20 Reputation points
2025-06-25T02:39:26.7733333+00:00

Hi team,

We are running a production-ready church translation app using Azure Translator and Azure Speech Services (STT & TTS, neural voice). During a 30 minute live test involving 8 user devices (all using Spanish), we observed the following:

*Translated character count (via Azure Translator): ~20,260 characters (correct and expected);

*Synthesized character count (Via Azure Speech - TTS): 529,080 characters (far higher than expected).

Each user received and played the same translated text via TTS, so we anticipated usage in the range of ~160,000-170,000 characters total. However, the actual usage was more than 3x that amount. This has significant cost implications.

DETAILS:

Speech resource: FuturesChurchCitySpeech
Region: Australia East
Time of test: 23rd June 2025, approximately 7:45-8:15pm ACST
Voice used: Spanish (neural voice)
App behaviour: Each user played the same translated phrases using Azure TTS. No looping or playback was expected.
Support Plan: We are on the Developer Support plan, but the Azure portal does not allow us to submit a technical support ticket - only documentation is shown.

I've tried calling support, live chat, Microsoft 365 support and emailing azcommunity@microsoft.com, but none of those channels have resulted in a human support interaction.

Questions:

  1. Why is the TTS character usage so high in this scenario?
  2. Is there a known issue with duplicate synthesis or metering in multi-user settings?
  3. Can someone from Microsoft escalate this internally or assist us in manually creating a support ticket please?

Any help or guidance is much appreciated!

Kind regards,
Lauren Van Niekerk

Azure AI Speech
Azure AI Speech
An Azure service that integrates speech processing into apps and services.
{count} votes

Accepted answer
  1. Prashanth Veeragoni 5,745 Reputation points Microsoft External Staff Moderator
    2025-06-25T06:35:52.07+00:00

    Hi Lauren Van Niekerk,

    Thank you for reaching out to Microsoft Q&A Forum.

    Understanding Azure TTS Billing and Usage Behavior:

    Azure bills TTS based on characters synthesized, not characters played. So, every time your application sends text to Azure to generate speech, it incurs a cost based on the number of characters in that request, even if the text is identical across multiple users.

    For example:

    If a phrase of 2,000 characters is sent to TTS by 8 users, Azure charges for 16,000 characters (2,000 x 8) — because each device sends its own synthesis request.

    This is detailed in Microsoft’s documentation:

    Azure Speech Service Pricing Details – Text-to-Speech

    TTS Character Counting Explanation

    Likely Root Causes of High Usage

    Based on how Azure TTS works, the following are potential causes for inflated character billing:

    1.No Shared TTS Audio Across Devices

    Each of the 8 user devices might be independently calling the TTS API to synthesize the same translated content. This leads to duplicated character usage.

    2.No Audio Caching

    If your app does not cache and reuse previously synthesized audio, even repeat playback of the same text results in multiple billing instances.

    3.Silent Retries or Looping

    There could be unintentional retries or background loops in the app logic that re-trigger TTS synthesis, consuming more characters without visible playback.

    4.High Frequency or Granular TTS Calls

    Instead of batching full phrases into single TTS requests, your app might be sending many small fragmented requests, each incurring minimum charges.

    Recommended Solution and Architecture Changes:

    To avoid this excessive usage, the application should follow this TTS Optimization Strategy:

    1.Centralized TTS Audio Generation

    • Synthesize translated text once on a central backend server.

    • Save the resulting audio as a stream or blob (e.g., in Azure Blob Storage or in-memory cache).

    • Distribute that audio file to all 8 user devices for playback.

    This way, only one TTS request per phrase is billed — reducing cost by up to 87.5% in your case (1 request vs 8 requests).

    Refer: TTS Audio Output - REST API

    2.Enable Caching on Client or Server

    If server-side implementation is not feasible yet, consider:

    • Caching the TTS output locally on client devices.

    • Reusing it instead of re-synthesizing the same content.

    Refer: Azure Speech SDK – Audio Output Options

    3.Enable SDK Logging to Debug Excessive Requests

    Use SDK or network logs to monitor how many times the synthesizeSpeech API is being called and how many characters are sent each time. This helps you detect hidden retries or looping behavior.

    Refer: Enable Azure Speech SDK Logging

    Hope this helps. If you have any follow-up questions, please let me know. I would be happy to help.

    **

    Please do not forget to "Accept the answer” and “up-vote” wherever the information provided helps you, this can be beneficial to other community members.

    Thank you! 

    0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.