Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

How to implement transfer learning / adaptation for tts

See original GitHub issue

Now I try to implement speaker adaptation toward pre-trained model. I added torch_load(args.model, model) into espnet/tts/pytorch_backend/tts.py. (Codes) But the loss became the same outline even if transformer-lr decreased. (Results) * encoder_alpha & decoder_alpha changed by decreasing transformer-lr.

Could you give me some advise? e.g., adding some codes, changing some configs, …

[Adaptation condition] Pre-trained model: libritts.transformer.v1 Adaptation data: test_clean (open speaker for training) Adaptation config:

transformer-lr: 1 -> transformer-lr: 1e-8
epochs: 100 -> epochs: 2
others becomes the same setting

[key wards] transfer learning, speaker adaptation, fine-tuning

Issue Analytics

State:
Created 4 years ago
Comments:9 (8 by maintainers)

Top GitHub Comments

2reactions

kan-bayashicommented, Sep 24, 2019

And I heard that libritts includes pose in speech without any corresponding text. This causes training failure. To avoid this issue, VAE-based tacotron is needed. https://arxiv.org/pdf/1810.07217.pdf

If I have a time, I will try to implement this model.

1reaction

potato-inouecommented, Sep 26, 2019

I checked that single-speaker model adaptation is running. model.eval() was the reason to not train. I have no pre-process for pose above you mentioned. Multi-speaker model has not checked yet.

model: ljspeech.transformer.v1
adapted speaker: speaker 237 (Female) from test_clean