Tacotron2 produces random mel outputs during inference (french dataset)
See original GitHub issueHi ! I have trained tacotron2 for 52k steps on the SynPaFlex french dataset. I deleted sentences longer than 20 seconds from the dataset and ended up with around 30 hours of single speaker data.
I made a custom synpaflex.py
processor in ./tensorflow_tts/processor/ with these symbols (adapted to french without arpabet) :
_pad = "pad"
_eos = "eos"
_punctuation = "!/\'(),-.:;? "
_letters = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzéèàùâêîôûçäëïöüÿœæ"
# Export all symbols:
SYNPAFLEX_SYMBOLS = (
[_pad] + list(_punctuation) + list(_letters) + [_eos]
)
I used basic_cleaners
for text cleaning.
in #182 the issue was similar, but the problem came from using tacotron2.v1.yaml
as configuration file. I am using my own tacotron2.synpaflex.v1.yaml
for both training and inference.
During synthesis, mel outputs are completely random : the output is different even if the sentence is kept the exact same. The audio signals sound like a french version of the WaveNet examples where no text has been provided during training, in the “Knowing What to Say” section of this page.
Here are my tensorboard results :
I must be doing something wrong somehow as I have been able to train on LJSpeech successfuly… Any idea ?
Issue Analytics
- State:
- Created 2 years ago
- Comments:41 (19 by maintainers)
Thank you! I’m just an ordinary TTS researcher. 😃 😄
@ihshareef Not yet, still investigating ! I will let everyone know when I find the solution.