How to implement transfer learning / adaptation for tts
See original GitHub issueNow I try to implement speaker adaptation toward pre-trained model.
I added torch_load(args.model, model)
into espnet/tts/pytorch_backend/tts.py
. (Codes)
But the loss became the same outline even if transformer-lr
decreased. (Results)
* encoder_alpha
& decoder_alpha
changed by decreasing transformer-lr
.
Could you give me some advise? e.g., adding some codes, changing some configs, …
[Adaptation condition] Pre-trained model: libritts.transformer.v1 Adaptation data: test_clean (open speaker for training) Adaptation config:
transformer-lr: 1
->transformer-lr: 1e-8
epochs: 100
->epochs: 2
- others becomes the same setting
[key wards] transfer learning, speaker adaptation, fine-tuning
Issue Analytics
- State:
- Created 4 years ago
- Comments:9 (8 by maintainers)
Top Results From Across the Web
Adapting TTS models For New Speakers using Transfer ...
We address this challenge by proposing transfer-learning guidelines for adapting high quality single-speaker TTS models for a new speaker, ...
Read more >Adapting TTS models For New Speakers ... - Paarth Neekhara
In this work, we adapt a single speaker TTS system for new speakers using a few minutes of training data. We use a...
Read more >Text-to-speech system for low-resource language using cross ...
[6] explored transfer learning for TTS with low-resource, emotional speech. ... To do this, they pre-trained an automatic speech recognition ...
Read more >Transfer Learning, Style Control, and Speaker Reconstruction ...
Meanwhile, to improve the performance of zero-shot speaker adaptation, we propose a new TTS model that incorporates an explicit style control ...
Read more >Exploring Transfer Learning for Low Resource Emotional TTS
In this paper, we investigate how to leverage fine-tuning on a pre-trained Deep Learning-based TTS model to synthesize speech with a small dataset...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
And I heard that libritts includes pose in speech without any corresponding text. This causes training failure. To avoid this issue, VAE-based tacotron is needed. https://arxiv.org/pdf/1810.07217.pdf
If I have a time, I will try to implement this model.
I checked that single-speaker model adaptation is running.
model.eval()
was the reason to not train. I have no pre-process for pose above you mentioned. Multi-speaker model has not checked yet.