question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

TTS Tacotron 2, "Weak" Alignment, any suggestions?

See original GitHub issue

So i am trying to train Tacotron 2 on some custom dataset, i have a single speaker dataset, that is roughly around 11 hours.

I have trained other implementations of tacotron 1 before, and on this one implementation the alignment learnt was very good.

ESPNet though for some reason learns alignment, but its a bit “weak” meaning at every timestamp the predicted phonemes are a bit off sometimes.

I am training this on 6 GPUs with batch size 32. I trained libritts as well from the default recipes, and according the the config, the model learnt very good alignment in only 30 epochs. But as can be seen from the GIF below, the alignment does get better, but it takes 800 epochs, and still its somewhat weak.

Can anyone give suggestions on what the problem could be, or what i could do to make things better, any help would be greatly appreciated.

EDIT:::

To not mess with the specifications alot, i have only changed the sampling rate in the config to match my datas SR, all other parameters like n_mels, nfft, etc remain the same. Could this be an issue? Should i resample my data to 24000 to match libritts specifications and try training the model again? My params are as follows

fs=16000      # sampling frequency
fmax=""       # maximum frequency
fmin=""       # minimum frequency
n_mels=80     # number of mel basis
n_fft=1024    # number of fft points
n_shift=256   # number of shift points
win_length="" # window length
 

Additionally this is what my melspectograms look like from feats.scp and feats.ark

random feats.scp feat.scp

random feats.ark feats.ark

This is very different from what libritts features looked like, but my limited knowledge in signal processing does not help me identify what the problem could be

Alignment spochs 280-870

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Reactions:1
  • Comments:35 (20 by maintainers)

github_iconTop GitHub Comments

2reactions
Imtinan1996commented, Nov 18, 2019

i think the main difference was the silence trimming, since the aforementioned repo also utilizes silence trimming.

Thank you both for all your help.

Also, can we please keep the issue open for now, i will close it once i am able to post the final results.

1reaction
Imtinan1996commented, Nov 18, 2019

UPDATE:

This is the alignment after 125 epochs, its too early to make a blatant statement but the progress looks healthy, i will be sure to keep you guys updated

espnet_alignment

Read more comments on GitHub >

github_iconTop Results From Across the Web

Tacotron-2 : Implementation and Experiments - Medium
In December 2016, Google released it's new research called 'Tacotron-2', a neural network implementation for Text-to-Speech synthesis.
Read more >
How are text-to-speech systems' spectrogram frames aligned ...
Nikolay's answer is incorrect. For OP's tacotron2 paper, the outputs are aligned by teacher-forcing. This means the ground-truth output is fed ...
Read more >
arXiv:2108.10447v1 [cs.SD] 23 Aug 2021
In our experi- ments, the alignment learning framework improves all tested. TTS architectures, both autoregressive (Flowtron, Tacotron 2).
Read more >
Humble FAQ - TTS 0.9.0 documentation
If both models do not perform well and especially the attention does not align, then try AlignTTS or GlowTTS. If you need faster...
Read more >
Pre-Alignment Guided Attention for Improving Training ...
With a unique encoder-decoder neural structure, the Tacotron2 system no longer needs ... based alignment learning in E2E TTS, some simple prior.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found