SSML bookmarks broken when using the <lang> tag with multilingual voice
See original GitHub issueDescribe the bug
When synthesizing SSML with the voice en-US-JennyMultilingualNeural
using the .NET SDK, the bookmark events are fired with unusable data or even not at all.
To Reproduce
Synthesizing sample 1…
<speak xmlns="http://www.w3.org/2001/10/synthesis" version="1.0" xml:lang="en-US">
<voice name='en-US-JennyMultilingualNeural' xml:lang="en-US">
<bookmark mark="One"/>
<lang xml:lang="en-US"> A </lang>
<bookmark mark="Two"/>
<lang xml:lang="en-US"> B </lang>
<bookmark mark="Three"/>
<lang xml:lang="en-US"> C </lang>
<bookmark mark="Four"/>
</voice>
</speak>
…returns…
Synthesis started: SynthesizingAudioStarted
Bookmark reached at 0: 児ࡓ网
Bookmark reached at 0: 樐网
Bookmark reached at 0: ࡕ网
Bookmark reached at 0: Four
Synthesis completed
…with the first 3 bookmark names being different characters each time you try it.
Synthesizing sample 2…
<speak xmlns="http://www.w3.org/2001/10/synthesis" version="1.0" xml:lang="en-US">
<voice name='en-US-JennyMultilingualNeural' xml:lang="en-US">
<lang xml:lang="en-US">
<bookmark mark="One"/>
A
<bookmark mark="Two"/>
B
<bookmark mark="Three"/>
C
<bookmark mark="Four"/>
</lang>
</voice>
</speak>
…returns no bookmarks at all:
Synthesis started: SynthesizingAudioStarted
Synthesis completed
Expected behavior
Synthesizing the working sample…
<speak xmlns="http://www.w3.org/2001/10/synthesis" version="1.0" xml:lang="en-US">
<voice name='en-US-JennyMultilingualNeural' xml:lang="en-US">
<prosody rate="+15%">
<bookmark mark="One"/>
A
<bookmark mark="Two"/>
B
<bookmark mark="Three"/>
C
<bookmark mark="Four"/>
</prosody>
</voice>
</speak>
…returns…
Bookmark reached at 0: One
Bookmark reached at 4250000: Two
Bookmark reached at 10500000: Three
Bookmark reached at 17500000: Four
…as expected. Adding prosody
or phoneme
tags does work as expected, as well.
Version of the Cognitive Services Speech SDK 1.24.2
Platform, Operating System, and Programming Language
- OS: Windows 11
- Hardware: x64
- Programming language: C#
Please let me know if providing the demo application we used for testing would help. Thanks!
Issue Analytics
- State:
- Created 8 months ago
- Comments:6 (3 by maintainers)
Top Results From Across the Web
Azure Text to speech SSML bookmark tags Error
1 Answer. The bookmark element is incompatible with the prosody and break elements. You can't adjust pause and prosody like pitch, contour, ...
Read more >Supported SSML Tags - Amazon Polly
To control the volume, rate, or pitch of your selected voice, use the prosody tag. Volume, speech rate, and pitch are dependent on...
Read more >SSML extensions for multi-language usage
In the SSML it is possible to specify the language using xml:lang attribute, that can be specified in many elements. The speech processor...
Read more >SSML <bookmark> with sentence having <phoneme> not ...
Hi, bookmarks seems now to work for sentences containing text. As soon as I add phoneme, it is no more working and all...
Read more >What's new in Azure AI Speech?
If you update to Speech SDK 1.25 and see a build break, please visit the Language Identification page to learn about the new...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Presuming the same internal work item ref. 4681629 as in another case - if different then @jiajzhan please update.
Thanks for reporting this issue. I will investigate this issue.