-
Notifications
You must be signed in to change notification settings - Fork 515
How to use SSML
With SSML, developer can have some control on speech synthesis output such as rate, pitch, volume, prosody, pronunciations. For non developer, audio content generation tool provides a way to modify speech synthesis output with a GUI.
For full SSML document, please see SSML document. Here we provide some more samples to use SSML features supported by Azure TTS.
<speak version='1.0' xmlns='http://www.w3.org/2001/10/synthesis' xmlns:mstts='https://www.w3.org/2001/mstts' xml:lang='zh-CN'><voice xml:lang='zh-CN' xml:gender='Female' name='Microsoft Server Speech Text to Speech Voice (zh-CN, XiaoxiaoNeural)'>您好晓晓!<audio src='https://file-examples.com/storage/feb8f98f1d627c0dc94b8cf/2017/11/file_example_MP3_700KB.mp3'>This is fallback audio.</audio></voice></speak>
- Example:
<speak version='1.0' xmlns='http://www.w3.org/2001/10/synthesis' xmlns:mstts='http://www.w3.org/2001/mstts' xml:lang='en-us'><mstts:backgroundaudio src='https://file-examples.com/storage/feb8f98f1d627c0dc94b8cf/2017/11/file_example_MP3_700KB.mp3' volume='0.7' fadein='3000' fadeout='4000'/><voice name='Microsoft Server Speech Text to Speech Voice (en-US, JessaNeural)' >Hi, this is a demo of using background audio</voice></speak>
- The audio specified with <mstts:backgroundaudio> tag will be mixed together with the other TTS synthesized waves.
- For voice tag is synthesized by TTS in sequence. Background audio is played mixed with TTS waves parallel.
- For fadein, it starts at TTS begin point, in milliseconds.
- For fadeout, it starts at TTS end point, in milliseconds.
- Volume default value is 1, the scaled sample value will be original sample value *volume.
- If background audio shorter than TTS and fadeout, it will loop; if longer, it will stop when fadeout finished.
<speak version='1.0' xmlns='http://www.w3.org/2001/10/synthesis' xmlns:mstts='https://www.w3.org/2001/mstts' xml:lang='en-US'><voice xml:lang='en-US' xml:gender='Female' name='Microsoft Server Speech Text to Speech Voice (en-US, JessaNeural)'><phoneme alphabet="ipa" ph="ʃaʊˈmi">pecan</phoneme> is awesome!</voice></speak>
<speak version='1.0' xmlns='http://www.w3.org/2001/10/synthesis' xmlns:mstts='https://www.w3.org/2001/mstts' xml:lang='en-US'><voice xml:lang='en-US' xml:gender='Female' name='Microsoft Server Speech Text to Speech Voice (en-US, JessaNeural)'>Hello, <phoneme alphabet='sapi' ph='jh iy 1 - n iy'>Jeanne</phoneme></voice></speak>
<speak version='1.0' xmlns='http://www.w3.org/2001/10/synthesis' xmlns:mstts='https://www.w3.org/2001/mstts' xml:lang='en-US'><voice xml:lang='en-US' xml:gender='Female' name='Microsoft Server Speech Text to Speech Voice (en-US, JessaNeural)'>Hello, <phoneme alphabet='ups' ph='jh i n i'>Jeanne</phoneme></voice></speak>
More complete document on phonemes
- Azure TTS: Empower every person and every organization on the planet to have a delightful digital voice!
- Azure Custom Voice: Build your one-of-a-kind Custom Voice and close to human Neural TTS in cloud and edge!