[Bug] Phoneme extraction with punctuations is wrongly delimited
See original GitHub issueDescribe the bug Punctuations in extracted phonemes are delimited wrongly.
For instance the sentence tuː foːɹ paʊndz , bʌt hɛviɚ aɪɚnz ,
should be tuː foːɹ paʊndz, bʌt hɛviɚ aɪɚnz,
So punctuations do not need a space preceding them.
I think the current implementation causes unnatural silences in the trained models.
Issue Analytics
- State:
- Created 2 years ago
- Reactions:1
- Comments:9 (3 by maintainers)
Top Results From Across the Web
[Feature request] [TTS] Support SSML in input text #752 - GitHub
[Bug] Phoneme extraction with punctuations is wrongly delimited #771 ... Regarding punctuation, I know that dashes and underscores (and ...
Read more >Assessment: In Practice | Reading Rockets
Learn what you are measuring with each literacy skill assessment, the age or grade when a skill should be mastered, and when during...
Read more >Capitalization and Punctuation Restoration: a Survey - arXiv
Furthermore, punctuation marks are classified as delimiting, separating and disambiguating. Some marks, like the comma, may belong to multiple categories ...
Read more >US20020138265A1 - Error correction in speech recognition ...
A list of recognition candidates may be associated with each recognized speech utterance. The step of generating the sequence of phonemes for the...
Read more >Speech Synthesis and Recognition
Note that punctuation symbols can be used in this notation, but words must not have initial capitals. An alternative is represent phonetic text...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I’m working on it! I have this pesky job that keeps taking my time 😉
This is taking longer than expected since I’m adding it and preliminary SSML support at this same time. It didn’t seem worth it to me to redo the existing gruut tokenizer (to add proper whitespace preservation) only to scrap it later for SSML.
I will be completing the changes to gruut this week, and my goal is to have it integrated and tested with 🐸 TTS by 1 Oct 👍
I’m reworking parts of gruut’s tokenization pipeline to preserve whitespace. I’ll delay updating the current pull request until these changes are in.