How is the Synthesized Characters count for Azure's Text to Speech service when generating from an SSML?

ggg 0 Reputation points
2025-07-25T22:42:30.5166667+00:00

How is the Synthesized Characters count calculated for Azure's Text to Speech service when generating speech from an SSML document? What are the specific rules?
I converted the following SSML file into speech:

<!--ID=FCB40C2B-1F9F-4C26-B1A1-CF8E67BE07D1;Version=1|{"Files":{}}-->
<!--ID=5B95B1CC-2C7B-494F-B746-CF22A0E779B7;Version=1|{"Locales":{"de-DE":{"AutoApplyCustomLexiconFiles":[{}]},"en-US":{"AutoApplyCustomLexiconFiles":[{}]}}}-->
<speak xmlns="http://www.w3.org/2001/10/synthesis" xmlns:mstts="http://www.w3.org/2001/mstts" xmlns:emo="http://www.w3.org/2009/10/emotionml" version="1.0" xml:lang="en-US">

<voice name="en-US-AvaMultilingualNeural"><prosody rate="-10.00%">about </prosody></voice>
<voice name="en-US-AndrewMultilingualNeural"><prosody rate="-10.00%">about </prosody></voice>
<voice name="en-US-EmmaMultilingualNeural"><prosody rate="-10.00%">about </prosody></voice>
<voice name="en-US-AlloyTurboMultilingualNeural"><prosody rate="-10.00%">about </prosody></voice>
<voice name="en-US-EchoTurboMultilingualNeural"><prosody rate="-10.00%">about </prosody></voice>
<voice name="en-US-FableTurboMultilingualNeural"><prosody rate="-10.00%">about </prosody></voice>
<voice name="en-US-OnyxTurboMultilingualNeural"><prosody rate="-10.00%">about </prosody></voice>
<voice name="en-US-NovaTurboMultilingualNeural"><prosody rate="-10.00%">about </prosody></voice>
<voice name="en-US-ShimmerTurboMultilingualNeural"><prosody rate="-10.00%">about </prosody></voice>
<voice name="en-US-BrianMultilingualNeural"><prosody rate="-10.00%">about </prosody></voice>
<voice name="en-US-JennyNeural"><prosody rate="-10.00%">about </prosody></voice>
<voice name="en-US-DavisNeural"><prosody rate="-10.00%">about </prosody></voice>

</speak>

After checking the billing, why was the above content charged for 480 characters when there are only 5 * 12 = 60 characters?!


Additionally, I am developing a paid text-to-speech feature for users in my application. When a user clicks to generate speech, I want to call an Azure API to pre-calculate the number of characters that will be consumed and estimate the cost of converting the text to speech. This way, the user can see the cost and confirm before proceeding with the actual conversion. How can I achieve this? Is there such an API?

Azure AI Speech
Azure AI Speech
An Azure service that integrates speech processing into apps and services.
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Suwarna S Kale 3,951 Reputation points
    2025-07-26T02:07:39.3+00:00

    Hello ggg,

    Thank you for posting your question in the Microsoft Q&A forum. 

    Azure's Text-to-Speech (TTS) service calculates synthesized characters based on the fully processed SSML structure, including XML tags, attributes, and nested elements, not just the visible text. In the provided example, while the word "about" appears only 60 times (5 characters × 12 voices), the actual billed count reached 480 characters because each <voice> block's opening/closing tags, attributes like name and rate, and nested <prosody> elements contribute to the total. Azure treats each voice instance as an independent synthesis task, multiplying the character count per block. 

    To estimate costs before conversion, developers must parse SSML manually or replicate Azure's counting logic, as no dedicated pre-calculation API exists. A workaround involves stripping comments/whitespace and counting remaining characters client-side. Alternatively, logging test API responses or integrating Azure's Retail Prices API can help project expenses. For user-facing cost estimates, applications should pre-process SSML locally and multiply by Azure's per-character pricing. Until Microsoft offers a native solution, combining client-side calculations with caching optimizations remains the most practical approach for cost transparency in paid TTS features. 

     

    If the above answer helped, please do not forget to "Accept Answer" as this may help other community members to refer the info if facing a similar issue. Your contribution to the Microsoft Q&A community is highly appreciated. 


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.