Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Fine-tuning procedure for mb_melgan vocoder, Voice Quality degrading with Fine-tuning.

See original GitHub issue

Hello! I’m trying to finetune a Vocoder model for Indian accents. I’ve followed the suggestions from thread #296 and have arrived at a suitable acoustics model.

To improve the output voice quality of the present vocoder( multiband_melgan.v1) model, I had followed the finetuning process mentioned in examples/multiband_melgan with 940000.h5 multiband_melgan.v1-EN as the pretrained model.

However the output has degraded(completely muffled speech) compared to the pretrained vocoder.

I had used the same dataset as the one used for fastspeech model training, with this command,

python ./examples/multiband_melgan/train_multiband_melgan.py \
--train-dir ./dump/train/ \
--dev-dir ./dump/valid/ \
--outdir ./examples/multiband_melgan/exp/train.multiband_melgan.v1/ \
--config ./examples/multiband_melgan/conf/multiband_melgan.v1.yaml \
--use-norm 1 \
--pretrained mb_melgan_generator.h5

These are the loss plots that I obtained, eval train

Please help me debug this problem, thank you

Issue Analytics

State:
Created 2 years ago
Comments:5

Top GitHub Comments

1reaction

WadoodAbdulcommented, Jul 7, 2021

Thanks for the clarification @dathudeptrai 😃

1reaction

dathudeptraicommented, Jul 7, 2021

Thanks for the help, @dathudeptrai. I Had not continued after 200k steps.

Just for clarification, I have to tune the Generator+Discriminator for the rest of the steps(1M), not just the generator for 1M steps. Is that correct?

yes 😄

Top Results From Across the Web

arXiv:2110.05798v2 [cs.SD] 6 Apr 2022

For finetuning the spectrogram-synthesis model, we require the text and speech pairs of the new speaker, while for the vocoder model, we only ......

Fine-Tune Whisper For Multilingual ASR with Transformers

In this blog, we present a step-by-step guide on fine-tuning Whisper for any multilingual ASR dataset using Hugging Face Transformers.

Fine-tuning pre-trained voice conversion model for adding ...

This paper proposes a training method for a vocoder-free any-to-many encoder-decoder VC model with limited data. Various pretraining techniques have been ...

Fine-tuning a TTS model - TTS 0.10.0 documentation

Fine-tuning takes a pre-trained model, and retrains it to improve the model performance on a ... Some models are fast and some are...

Creating Speech Synthesis Process by Using Pulse Model in ...

the speech synthesis process, but the vocoder is combined only the ... the quality degrading process is reducing the speech synthesis.