Fine-tuning procedure for mb_melgan vocoder, Voice Quality degrading with Fine-tuning.
See original GitHub issueHello! I’m trying to finetune a Vocoder model for Indian accents. I’ve followed the suggestions from thread #296 and have arrived at a suitable acoustics model.
To improve the output voice quality of the present vocoder( multiband_melgan.v1) model, I had followed the finetuning process mentioned in examples/multiband_melgan with 940000.h5 multiband_melgan.v1-EN as the pretrained model.
However the output has degraded(completely muffled speech) compared to the pretrained vocoder.
I had used the same dataset as the one used for fastspeech model training, with this command,
python ./examples/multiband_melgan/train_multiband_melgan.py \
--train-dir ./dump/train/ \
--dev-dir ./dump/valid/ \
--outdir ./examples/multiband_melgan/exp/train.multiband_melgan.v1/ \
--config ./examples/multiband_melgan/conf/multiband_melgan.v1.yaml \
--use-norm 1 \
--pretrained mb_melgan_generator.h5
These are the loss plots that I obtained,
Please help me debug this problem, thank you
Issue Analytics
- State:
- Created 2 years ago
- Comments:5
Top Results From Across the Web
arXiv:2110.05798v2 [cs.SD] 6 Apr 2022
For finetuning the spectrogram-synthesis model, we require the text and speech pairs of the new speaker, while for the vocoder model, we only ......
Read more >Fine-Tune Whisper For Multilingual ASR with Transformers
In this blog, we present a step-by-step guide on fine-tuning Whisper for any multilingual ASR dataset using Hugging Face Transformers.
Read more >Fine-tuning pre-trained voice conversion model for adding ...
This paper proposes a training method for a vocoder-free any-to-many encoder-decoder VC model with limited data. Various pretraining techniques have been ...
Read more >Fine-tuning a TTS model - TTS 0.10.0 documentation
Fine-tuning takes a pre-trained model, and retrains it to improve the model performance on a ... Some models are fast and some are...
Read more >Creating Speech Synthesis Process by Using Pulse Model in ...
the speech synthesis process, but the vocoder is combined only the ... the quality degrading process is reducing the speech synthesis.
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Thanks for the clarification @dathudeptrai 😃
yes 😄