Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Neural vocoder support

See original GitHub issue

Let’s discuss neural vocoder support.

Types

Parallel WaveGAN (PWG), MelGAN I can easily implement these two since I am familiar with it. Kan-bayashi has packed his repo so we can simply pip install -U parallel_wavegan.
WaveNet vocoder I don’t really prefer this one due to its slow inference speed.
WaveGlow I don’t really prefer this one due to its slow training speed.

Usage & Structure We can add a synthesis stage to the recipe. We can provide pretrained models for users to download, and use an argument like voc_expdir to load the pretrained model. In addition, with PWG, kan-bayashi has also packed training code in the package, so we can provide recipes for users to train their own vocoders if they want. One example design can be like egs/pwg/vcc2018.

Issue Analytics

State:
Created 3 years ago
Reactions:1
Comments:6 (3 by maintainers)

Top GitHub Comments

1reaction

unilightcommented, Jun 3, 2020

I see. I will train vocoders in kan-bayashi/ParallelWaveGAN and just provide pretrained model links in this repo. I will work on it next.

1reaction

k2kobayashicommented, Jun 2, 2020

It’s nice to integrate PWG/MelGAN. Actually, I have already discuss it with @kan-bayashi. He said he can create recipe and pre-trained model after releasing vcc2020 dataset.

For the structure, I think following is nice.

add stage 6 in egs/vaevc/template/run.sh
implement egs/vaevc/<recipe>/local/download_pretrained_neuralvocoder.sh to download pre-trained models.
implement crank/bin/generate_wav_{pwg,melgan}.py to generate wav file w/ pre-trained model and generated h5 in stage 5.

Top Results From Across the Web

Neural Vocoder is All You Need for Speech Super-resolution

In this paper, we propose a neural vocoder based speech super-resolution method (NVSR) that can handle a variety of input resolution and ...

Neural Vocoding for Singing and Speaking Voices with ... - MDPI

In the long term, this work aims to develop a neural vocoder supporting perceptually transparent analysis/resynthesis for arbitrary speakers and arbitrary voice ...

Support for neural vocoders (NSF, Parallel WaveGAN) #18

NNSVS now supports Parallel WaveGAN and NSF. Is it possible to use these vocoders on ENUNU?

Neural TTS - Amazon Polly - AWS Documentation

The following features are supported for neural voices: Real-time and asynchronous speech synthesis operations. Newscaster speaking style. For more information ...

Azure Neural TTS voices upgraded to 48kHz with HiFiNet2 ...

Thanks to the latest innovation on our HiFiNet vocoder, ... Custom Neural Voice can support Lite projects (CNV Lite,) with which customers ...