-
Notifications
You must be signed in to change notification settings - Fork 514
Typical TTS Scenario
Many enterprise wants to build their own voice assistant. Typically the TTS in such scenario speaks short utterances in the human-bot interactions. A few key design consideration in this scenario:
- minimize dialog latency. it is good to use direct line speech which can hook up the bot service and TTS/SR service in closest region
- use an off-the-shelf voice. The neural voice are recommended for its high quality and personality fit.
- use a branded voice. Custom neural voice is recommended to create a branded voice with a few hundreds recordings.
Application can use TTS to read long content like an email, news article or one chapter of a book. In such scenario, the content is usually long. So it is desired to have a way to play the response while TTS is still rendering. There are two options:
- Speech SDK supports streaming output of the responses. The application developers can develop the streaming rendering logic on the client for the audio stream.
- Use immersive reading SDK which uses Azure TTS underline.
TTS can be used to turn a long article or even book into audio files. To do it, one can try below method.
In car scenario, normally TTS still needs to work when disconnected. So it is desired to have a hybrid design.
There could be different skills in car scenario
- If it is an online skill like weather, recommend to call TTS, compress the audio with silk/ogg/mp3 and send to the car head unit to render
- If it is an on device skills like opening the window, use below policy
- use online TTS when there is connection.
- fallback to a device voice in the same voice talent when disconnected.
We offer hybrid solution for connected car scenario
Windows 10 OS provides more than 40+ locales TTS for accessibility scenario. It can be invoked using WinRT Speech API
- Azure TTS: Empower every person and every organization on the planet to have a delightful digital voice!
- Azure Custom Voice: Build your one-of-a-kind Custom Voice and close to human Neural TTS in cloud and edge!