question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Tacotron2 produces random mel outputs during inference (french dataset)

See original GitHub issue

Hi ! I have trained tacotron2 for 52k steps on the SynPaFlex french dataset. I deleted sentences longer than 20 seconds from the dataset and ended up with around 30 hours of single speaker data.

I made a custom synpaflex.py processor in ./tensorflow_tts/processor/ with these symbols (adapted to french without arpabet) :

_pad = "pad"
_eos = "eos"
_punctuation = "!/\'(),-.:;? "
_letters = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzéèàùâêîôûçäëïöüÿœæ"

# Export all symbols:
SYNPAFLEX_SYMBOLS = (
    [_pad] + list(_punctuation) + list(_letters) + [_eos]
)

I used basic_cleaners for text cleaning.

in #182 the issue was similar, but the problem came from using tacotron2.v1.yaml as configuration file. I am using my own tacotron2.synpaflex.v1.yaml for both training and inference.

During synthesis, mel outputs are completely random : the output is different even if the sentence is kept the exact same. The audio signals sound like a french version of the WaveNet examples where no text has been provided during training, in the “Knowing What to Say” section of this page.

Here are my tensorboard results : image

I must be doing something wrong somehow as I have been able to train on LJSpeech successfuly… Any idea ?

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:41 (19 by maintainers)

github_iconTop GitHub Comments

2reactions
ttslrcommented, Jul 8, 2021

@ttslr Hi, seems you are an expert in this field 😄. I saw you have a lot of papers about TTS 😄

Thank you! I’m just an ordinary TTS researcher. 😃 😄

1reaction
samuel-luniicommented, Jun 18, 2021

@ihshareef Not yet, still investigating ! I will let everyone know when I find the solution.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Grad-TTS: A Diffusion Probabilistic Model for Text-to ... - arXiv
for flexible inference schemes. In this paper we introduce Grad-TTS, a novel text-to-speech model with score-based decoder producing mel-.
Read more >
Tacotron 2 - PyTorch
The Tacotron 2 model produces mel spectrograms from input text using encoder-decoder architecture. WaveGlow (also available via torch.hub) is a flow-based ...
Read more >
Towards Robust Neural Vocoding for Speech Generation
The Mel-spectrogram output of the Tacotron 2 is fed to the vocoders pretrained with different datasets. For Griffin-Lim algorithm, Mel-spectrograms are ...
Read more >
(PDF) Es-Tacotron2: Multi-Task Tacotron 2 with Pre-Trained ...
Suffering from an over-smoothness problem, Tacotron 2 produced 'averaged' speech, making the synthesized speech sounds unnatural and inflexible.
Read more >
BIDIRECTIONAL VARIATIONAL INFERENCE FOR NON ...
In experiments conducted on LJSpeech dataset, we show that our model generates a mel-spectrogram 27 times faster than Tacotron. 2 with similar speech...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found