Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Tacotron plus World vocoder

See original GitHub issue

Hey I am glad to inform you that I have succeeded to merge Tacotron model with World vocoder and generated some evaluation results as follows. The results sound not bad but still not perfect. However it shows another way to train different feature parameters with Tacotron. The World vocoder is an open source project and thus everyone can use it for all. Moreover the quality of resynth results from that vocoder is better than that from Griffin-Lim since the three features (lf0[1], mgc[60] and ap[5]) contain not only magnitude spectrograms but also phase information. Furthermore the depth of the features is low enough that we do not need postnet for Tacotron model. The performance of training can be reduced to 0.7 second per step. The inference can also be quick enough even it only works on CPU. So it really worthes trying.

I would like to share my experimental source code with you as follows. Note that it currently only for Chinese mandarin. You may modify it for other languages: tacotron-world-vocoder branch Python-Wrapper-for-World-Vocoder pysptk merlin-world-vocoder branch By the way you need use python setup.py install and the copy the so file manually into the system path for pysptk and python wrapper project.

Besides I also would like to provide two Python scripts for World vocoder resynth test. world_vocoder_resynth_scripts.zip

world_vocoder_demo.zip

Issue Analytics

State:
Created 5 years ago
Reactions:4
Comments:79 (1 by maintainers)

Top GitHub Comments

2reactions

begeekmyfriendcommented, Mar 27, 2019

Of course @herenje step-28000-align

0reactions

begeekmyfriendcommented, May 16, 2019

Give up this solution and turn to WaveRNN. Feel free to reopen this issue. The accuracy of F0 feature is hard for prediction.

Top Results From Across the Web

Tacoton-2 plus World vocoder · Issue #304 - GitHub

Hey I am glad to inform you that I have succeeded to merge Tacotron model with World vocoder and generated some evaluation results...

WORLD: A Vocoder-Based High-Quality Speech Synthesis ...

This new speech synthesis system has not only sound quality but also quick processing. It consists of three analysis algorithms and one synthesis...

State Of The Art of Speech Synthesis at the End of May 2021

Among the most popular vocoders are Griffin-Lim, WORLD, WaveNet, SampleRNN, GAN-TTS, ... DeepVoice 3, Tacotron, Tacotron 2, Char2wav, and ParaNet use ...

Wave-Tacotron: Spectrogram-free end-to-end text-to-speech ...

TTS* in one sequence-to-sequence model. ○ block-autoregressive normalizing flow, no vocoder. ○ *normalized-text- or phoneme-to-speech.

A Survey on Neural Speech Synthesis - arXiv

and Tacotron [382] does not use linguistic features but directly ... into waveform through vocoders such as STRAIGHT [155] and WORLD [238].