Thank you for reaching out to Microsoft Q&A Forum.
Understanding Azure TTS Billing and Usage Behavior:
Azure bills TTS based on characters synthesized, not characters played. So, every time your application sends text to Azure to generate speech, it incurs a cost based on the number of characters in that request, even if the text is identical across multiple users.
For example:
If a phrase of 2,000 characters is sent to TTS by 8 users, Azure charges for 16,000 characters (2,000 x 8) — because each device sends its own synthesis request.
This is detailed in Microsoft’s documentation:
Azure Speech Service Pricing Details – Text-to-Speech
TTS Character Counting Explanation
Likely Root Causes of High Usage
Based on how Azure TTS works, the following are potential causes for inflated character billing:
1.No Shared TTS Audio Across Devices
Each of the 8 user devices might be independently calling the TTS API to synthesize the same translated content. This leads to duplicated character usage.
2.No Audio Caching
If your app does not cache and reuse previously synthesized audio, even repeat playback of the same text results in multiple billing instances.
3.Silent Retries or Looping
There could be unintentional retries or background loops in the app logic that re-trigger TTS synthesis, consuming more characters without visible playback.
4.High Frequency or Granular TTS Calls
Instead of batching full phrases into single TTS requests, your app might be sending many small fragmented requests, each incurring minimum charges.
Recommended Solution and Architecture Changes:
To avoid this excessive usage, the application should follow this TTS Optimization Strategy:
1.Centralized TTS Audio Generation
• Synthesize translated text once on a central backend server.
• Save the resulting audio as a stream or blob (e.g., in Azure Blob Storage or in-memory cache).
• Distribute that audio file to all 8 user devices for playback.
This way, only one TTS request per phrase is billed — reducing cost by up to 87.5% in your case (1 request vs 8 requests).
Refer: TTS Audio Output - REST API
2.Enable Caching on Client or Server
If server-side implementation is not feasible yet, consider:
• Caching the TTS output locally on client devices.
• Reusing it instead of re-synthesizing the same content.
Refer: Azure Speech SDK – Audio Output Options
3.Enable SDK Logging to Debug Excessive Requests
Use SDK or network logs to monitor how many times the synthesizeSpeech API is being called and how many characters are sent each time. This helps you detect hidden retries or looping behavior.
Refer: Enable Azure Speech SDK Logging
Hope this helps. If you have any follow-up questions, please let me know. I would be happy to help.
**
Please do not forget to "Accept the answer” and “up-vote” wherever the information provided helps you, this can be beneficial to other community members.
Thank you!