question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Tacotron plus World vocoder

See original GitHub issue

Hey I am glad to inform you that I have succeeded to merge Tacotron model with World vocoder and generated some evaluation results as follows. The results sound not bad but still not perfect. However it shows another way to train different feature parameters with Tacotron. The World vocoder is an open source project and thus everyone can use it for all. Moreover the quality of resynth results from that vocoder is better than that from Griffin-Lim since the three features (lf0[1], mgc[60] and ap[5]) contain not only magnitude spectrograms but also phase information. Furthermore the depth of the features is low enough that we do not need postnet for Tacotron model. The performance of training can be reduced to 0.7 second per step. The inference can also be quick enough even it only works on CPU. So it really worthes trying.

I would like to share my experimental source code with you as follows. Note that it currently only for Chinese mandarin. You may modify it for other languages: tacotron-world-vocoder branch Python-Wrapper-for-World-Vocoder pysptk merlin-world-vocoder branch By the way you need use python setup.py install and the copy the so file manually into the system path for pysptk and python wrapper project.

Besides I also would like to provide two Python scripts for World vocoder resynth test. world_vocoder_resynth_scripts.zip

world_vocoder_demo.zip image

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Reactions:4
  • Comments:79 (1 by maintainers)

github_iconTop GitHub Comments

2reactions
begeekmyfriendcommented, Mar 27, 2019

Of course @herenje step-28000-align

0reactions
begeekmyfriendcommented, May 16, 2019

Give up this solution and turn to WaveRNN. Feel free to reopen this issue. The accuracy of F0 feature is hard for prediction.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Tacoton-2 plus World vocoder · Issue #304 - GitHub
Hey I am glad to inform you that I have succeeded to merge Tacotron model with World vocoder and generated some evaluation results...
Read more >
WORLD: A Vocoder-Based High-Quality Speech Synthesis ...
This new speech synthesis system has not only sound quality but also quick processing. It consists of three analysis algorithms and one synthesis...
Read more >
State Of The Art of Speech Synthesis at the End of May 2021
Among the most popular vocoders are Griffin-Lim, WORLD, WaveNet, SampleRNN, GAN-TTS, ... DeepVoice 3, Tacotron, Tacotron 2, Char2wav, and ParaNet use ...
Read more >
Wave-Tacotron: Spectrogram-free end-to-end text-to-speech ...
TTS* in one sequence-to-sequence model. ○ block-autoregressive normalizing flow, no vocoder. ○ *normalized-text- or phoneme-to-speech.
Read more >
A Survey on Neural Speech Synthesis - arXiv
and Tacotron [382] does not use linguistic features but directly ... into waveform through vocoders such as STRAIGHT [155] and WORLD [238].
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found