Tacotron plus World vocoder
See original GitHub issueHey I am glad to inform you that I have succeeded to merge Tacotron model with World vocoder and generated some evaluation results as follows. The results sound not bad but still not perfect. However it shows another way to train different feature parameters with Tacotron. The World vocoder is an open source project and thus everyone can use it for all. Moreover the quality of resynth results from that vocoder is better than that from Griffin-Lim since the three features (lf0[1], mgc[60] and ap[5]) contain not only magnitude spectrograms but also phase information. Furthermore the depth of the features is low enough that we do not need postnet for Tacotron model. The performance of training can be reduced to 0.7 second per step. The inference can also be quick enough even it only works on CPU. So it really worthes trying.
I would like to share my experimental source code with you as follows. Note that it currently only for Chinese mandarin. You may modify it for other languages:
tacotron-world-vocoder branch
Python-Wrapper-for-World-Vocoder
pysptk merlin-world-vocoder branch
By the way you need use python setup.py install
and the copy the so file manually into the system path for pysptk
and python wrapper project.
Besides I also would like to provide two Python scripts for World vocoder resynth test. world_vocoder_resynth_scripts.zip
Issue Analytics
- State:
- Created 5 years ago
- Reactions:4
- Comments:79 (1 by maintainers)
Top GitHub Comments
Of course @herenje
Give up this solution and turn to WaveRNN. Feel free to reopen this issue. The accuracy of F0 feature is hard for prediction.