Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Bug] Gruut espeak inconsistencies makes the training harder.

See original GitHub issue

Describe the bug Inconsistency b/w gruut with speak phonemes vs phonemizer. Gruut adds additional : between characters. It breaks the pronunciation especially as saying to or to be.

To Reproduce For the sentence It took me quite a long time to develop a voice, and now that I have it I'm not going to be silent.

Phonemizer with espeak-ng:

echo "It took me quite a long time to develop a voice, and now that I have it I'm not going to be silent." | phonemize -b espeak -l en-gb
ɪt tʊk miː kwaɪt ɐ lɒŋ taɪm tə dɪvɛləp ɐ vɔɪs and naʊ ðat aɪ hav ɪt aɪm nɒt ɡəʊɪŋ təbi saɪlənt

Gruut:

text2phone(text, LANG, use_espeak_phonemes=True)
'ɪ|t| t|ʊ|k| m|iː| k|w|aɪ|t| ɐ| l|ɔ|ŋ| t|aɪ|m| t|uː| d|ɪ|v|ɛ|l|ə|p| ɐ| v|ɔɪ|s| ,| æ|n|d| n|aʊ| ð|æ|t| aɪ| h|æ|v| ɪ|t| aɪ|m| n|ɑː|t| ɡ|oʊ|ɪ|ŋ| t|uː| b|iː| s|aɪ|l|ə|n|t| .'

Additional context I see that these inconsistencies make the learning harder for 🐸 TTS models.

In general training, a model with raw chars produces good results faster than phoneme-based training. I assume this is because of such inconsistencies between the phonemizer and gruut.

I am not training a model with use_espeak_phonemes=False and see if it makes any difference.

Issue Analytics

State:
Created 2 years ago
Comments:13 (3 by maintainers)

Top GitHub Comments

2reactions

thorstenMuellercommented, Jul 28, 2021

I do not want to hijack this discussion and am not sure if this has been discussed somewhere else, but i’m confused how Gruut deals with foreign language words in a sentence? In Germany we use some english words in german sentences which are pronounced wrong.

espeak

echo “Ein Song geht mir nicht mehr aus den Ohren.” | phonemize -b espeak -l de

[WARNING] 1 utterances containing language switches on lines 1
[WARNING] extra phones may appear in the "de" phoneset
[WARNING] language switch flags have been kept (applying "keep-flags" policy)
aɪn (en)sɒŋ(de) ɡeːt miːɾ nɪçt meːɾ aʊs deːn oːrən

TTS server with Gruut

It seems that Gruut recognizes it’s an english word and is tagging it right, but the spoken audio doesn’t sound english.

 > Model input: Ein Song geht mir nicht mehr aus den Ohren.
 > Text splitted to sentences.
['Ein Song geht mir nicht mehr aus den Ohren.']
 > Phonemes: aɪ|n| (en)|s|ɒ|ŋ|(de)| ɡ|eː|t| m|iː|ɾ| n|ɪ|ç|t| m|eː|ɾ| aʊ|s| d|eː|n| oː|r|ə|n| .
 > Processing time: 2.6341404914855957
 > Real-time factor: 0.7461818838291031

I tried following sentence with current Coqui release:

Magst Du den Song der auf der Party lief?

Leading to following phonemes:

Phonemes: m|ɑː|k|s|t| d|uː| d|eː|n| (en)|s|ɒ|ŋ|(de)| d|ɛ|ɾ| aʊ|f| d|ɛ|ɾ| (en)|p|ɑː|t|i|(de)| l|iː|f| ?

Here’s the spoken output: https://sndup.net/4jcy

It sounds a little bit as the language tags will be spoken too 😉 Could that be the case?

1reaction

synesthesiamcommented, Sep 29, 2021

Hi @skol101, no bother 🙂 Thanks for your patience with this issue.

I did push a PR, but it’s since been removed and those changes will be bundled together with fixes/additions for two other issues:

You may be able to use my fixes anyway by simply upgrading the version of gruut in your Python virtual environment for 🐸 TTS with: pip3 install --upgrade 'gruut[cs,de,es,fr,it,nl,pt,ru,sv]~=1.3.0'

Top Results From Across the Web

Mozilla tts tutorial - miocittadino.it

The model was trained on data from the 中文标准女声音库 with 10000 sentences from DataBaker Demoing the results of a custom language model with...

Simple index - piwheels

... geeksforgeeks-golang-zh pydeepai cldclr nerwhal iopatent backports-unittest-mock japanmap make-qt-ui minisentry wayround-org-pyeditor vsadmin mattdaemon ...

Untitled

Mccaa act malta, Cisco cdn appliance, Suchformular excel, Miniscule bug movies, ... Mobile police department warrants, Blau graue augen make up, ...

The tts from coqui-ai - GithubHelp

Detailed training logs on the terminal and Tensorboard. ... SpeedySpeech model causes error for input text shorter than 13 characters.