Latency Issue In Speech To Text Realtime API
We are using Azure Speech-to-Text (STT) streaming API from the Central India region and experiencing consistent latency of 1.5 to 2 seconds from audio input to transcription result.
Our setup:
SDK: JavaScript/Node SDK using SpeechRecognizer
with PushAudioInputStream
Audio Format: 16kHz, mono, 16-bit PCM
Speech_SegmentationSilenceTimeoutMs = 300
, EndSilenceTimeoutMs = 500
Using startContinuousRecognitionAsync()
Client is located in India, and we are using the Speech Service key for the Central India region.
We’ve implemented all latency reduction suggestions from this official blog, including optimized buffering, silence detection, and configuration.
Still, we’re facing 1.5–2s latency, which affects our app’s responsiveness. Please help investigate whether this is expected or if it can be optimized further (region/config/network/etc.).