question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

How to implement transfer learning / adaptation for tts

See original GitHub issue

Now I try to implement speaker adaptation toward pre-trained model. I added torch_load(args.model, model) into espnet/tts/pytorch_backend/tts.py. (Codes) But the loss became the same outline even if transformer-lr decreased. (Results) * encoder_alpha & decoder_alpha changed by decreasing transformer-lr.

Could you give me some advise? e.g., adding some codes, changing some configs, …

[Adaptation condition] Pre-trained model: libritts.transformer.v1 Adaptation data: test_clean (open speaker for training) Adaptation config:

  • transformer-lr: 1 -> transformer-lr: 1e-8
  • epochs: 100 -> epochs: 2
  • others becomes the same setting

[key wards] transfer learning, speaker adaptation, fine-tuning

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:9 (8 by maintainers)

github_iconTop GitHub Comments

2reactions
kan-bayashicommented, Sep 24, 2019

And I heard that libritts includes pose in speech without any corresponding text. This causes training failure. To avoid this issue, VAE-based tacotron is needed. https://arxiv.org/pdf/1810.07217.pdf

If I have a time, I will try to implement this model.

1reaction
potato-inouecommented, Sep 26, 2019

I checked that single-speaker model adaptation is running. model.eval() was the reason to not train. I have no pre-process for pose above you mentioned. Multi-speaker model has not checked yet.

  • model: ljspeech.transformer.v1
  • adapted speaker: speaker 237 (Female) from test_clean
Read more comments on GitHub >

github_iconTop Results From Across the Web

Adapting TTS models For New Speakers using Transfer ...
We address this challenge by proposing transfer-learning guidelines for adapting high quality single-speaker TTS models for a new speaker, ...
Read more >
Adapting TTS models For New Speakers ... - Paarth Neekhara
In this work, we adapt a single speaker TTS system for new speakers using a few minutes of training data. We use a...
Read more >
Text-to-speech system for low-resource language using cross ...
[6] explored transfer learning for TTS with low-resource, emotional speech. ... To do this, they pre-trained an automatic speech recognition ...
Read more >
Transfer Learning, Style Control, and Speaker Reconstruction ...
Meanwhile, to improve the performance of zero-shot speaker adaptation, we propose a new TTS model that incorporates an explicit style control ...
Read more >
Exploring Transfer Learning for Low Resource Emotional TTS
In this paper, we investigate how to leverage fine-tuning on a pre-trained Deep Learning-based TTS model to synthesize speech with a small dataset...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found