[Bug] Gruut espeak inconsistencies makes the training harder.
See original GitHub issueDescribe the bug
Inconsistency b/w gruut with speak phonemes vs phonemizer. Gruut adds additional :
between characters. It breaks the pronunciation especially as saying to
or to be
.
To Reproduce
For the sentence It took me quite a long time to develop a voice, and now that I have it I'm not going to be silent.
Phonemizer with espeak-ng:
echo "It took me quite a long time to develop a voice, and now that I have it I'm not going to be silent." | phonemize -b espeak -l en-gb
ɪt tʊk miː kwaɪt ɐ lɒŋ taɪm tə dɪvɛləp ɐ vɔɪs and naʊ ðat aɪ hav ɪt aɪm nɒt ɡəʊɪŋ təbi saɪlənt
Gruut:
text2phone(text, LANG, use_espeak_phonemes=True)
'ɪ|t| t|ʊ|k| m|iː| k|w|aɪ|t| ɐ| l|ɔ|ŋ| t|aɪ|m| t|uː| d|ɪ|v|ɛ|l|ə|p| ɐ| v|ɔɪ|s| ,| æ|n|d| n|aʊ| ð|æ|t| aɪ| h|æ|v| ɪ|t| aɪ|m| n|ɑː|t| ɡ|oʊ|ɪ|ŋ| t|uː| b|iː| s|aɪ|l|ə|n|t| .'
Additional context I see that these inconsistencies make the learning harder for 🐸 TTS models.
In general training, a model with raw chars produces good results faster than phoneme-based training. I assume this is because of such inconsistencies between the phonemizer and gruut.
I am not training a model with use_espeak_phonemes=False
and see if it makes any difference.
Issue Analytics
- State:
- Created 2 years ago
- Comments:13 (3 by maintainers)
Top Results From Across the Web
Mozilla tts tutorial - miocittadino.it
The model was trained on data from the 中文标准女声音库 with 10000 sentences from DataBaker Demoing the results of a custom language model with...
Read more >Simple index - piwheels
... geeksforgeeks-golang-zh pydeepai cldclr nerwhal iopatent backports-unittest-mock japanmap make-qt-ui minisentry wayround-org-pyeditor vsadmin mattdaemon ...
Read more >Untitled
Mccaa act malta, Cisco cdn appliance, Suchformular excel, Miniscule bug movies, ... Mobile police department warrants, Blau graue augen make up, ...
Read more >The tts from coqui-ai - GithubHelp
Detailed training logs on the terminal and Tensorboard. ... SpeedySpeech model causes error for input text shorter than 13 characters.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I do not want to hijack this discussion and am not sure if this has been discussed somewhere else, but i’m confused how Gruut deals with foreign language words in a sentence? In Germany we use some english words in german sentences which are pronounced wrong.
espeak
echo “Ein Song geht mir nicht mehr aus den Ohren.” | phonemize -b espeak -l de
TTS server with Gruut
It seems that Gruut recognizes it’s an english word and is tagging it right, but the spoken audio doesn’t sound english.
I tried following sentence with current Coqui release:
Leading to following phonemes:
Here’s the spoken output: https://sndup.net/4jcy
It sounds a little bit as the language tags will be spoken too 😉 Could that be the case?
Hi @skol101, no bother 🙂 Thanks for your patience with this issue.
I did push a PR, but it’s since been removed and those changes will be bundled together with fixes/additions for two other issues:
You may be able to use my fixes anyway by simply upgrading the version of
gruut
in your Python virtual environment for 🐸 TTS with:pip3 install --upgrade 'gruut[cs,de,es,fr,it,nl,pt,ru,sv]~=1.3.0'