TTS Tacotron 2, "Weak" Alignment, any suggestions?
See original GitHub issueSo i am trying to train Tacotron 2 on some custom dataset, i have a single speaker dataset, that is roughly around 11 hours.
I have trained other implementations of tacotron 1 before, and on this one implementation the alignment learnt was very good.
ESPNet though for some reason learns alignment, but its a bit “weak” meaning at every timestamp the predicted phonemes are a bit off sometimes.
I am training this on 6 GPUs with batch size 32. I trained libritts as well from the default recipes, and according the the config, the model learnt very good alignment in only 30 epochs. But as can be seen from the GIF below, the alignment does get better, but it takes 800 epochs, and still its somewhat weak.
Can anyone give suggestions on what the problem could be, or what i could do to make things better, any help would be greatly appreciated.
EDIT:::
To not mess with the specifications alot, i have only changed the sampling rate in the config to match my datas SR, all other parameters like n_mels, nfft, etc remain the same. Could this be an issue? Should i resample my data to 24000 to match libritts specifications and try training the model again? My params are as follows
fs=16000 # sampling frequency
fmax="" # maximum frequency
fmin="" # minimum frequency
n_mels=80 # number of mel basis
n_fft=1024 # number of fft points
n_shift=256 # number of shift points
win_length="" # window length
Additionally this is what my melspectograms look like from feats.scp and feats.ark
random feats.scp
random feats.ark
This is very different from what libritts features looked like, but my limited knowledge in signal processing does not help me identify what the problem could be
Issue Analytics
- State:
- Created 4 years ago
- Reactions:1
- Comments:35 (20 by maintainers)
Top GitHub Comments
i think the main difference was the silence trimming, since the aforementioned repo also utilizes silence trimming.
Thank you both for all your help.
Also, can we please keep the issue open for now, i will close it once i am able to post the final results.
UPDATE:
This is the alignment after 125 epochs, its too early to make a blatant statement but the progress looks healthy, i will be sure to keep you guys updated