question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Bug] Korean Zero-shot training

See original GitHub issue

Describe the bug

Dear Author, Thank you for your nice work, I tried apply CoquiTTS for Korean languages. And I also trained this with KSS single dataset, (12844 samples) and seem that it can w coquiTTS.zip ork and synthesize with an acceptable result with GlowTTS and MB-MelGan. The audio result is attached However, when I tried to apply to multi-speaker data (AIHUB) with more thatn 1500 speakers and 30000 samples only, I get the error. Could you help me figure out what is problem. And the better way to do Zero-shot for Korean. The error image will be attached. coquiTTS.zip

To Reproduce

src.zip

Expected behavior

No response

Logs

No response

Environment

-TTS develop version, using TTS folder, and not install TTS
- cuda117
- python 3.7

Additional context

No response

Issue Analytics

  • State:open
  • Created 9 months ago
  • Comments:13 (7 by maintainers)

github_iconTop GitHub Comments

2reactions
Edressoncommented, Dec 8, 2022

e other questions about the correct way to train multispeaker with coquiTTS for voice cloning purposes. As my understanding, in glowtts config file, we have some values for setup speaker_embedding, could you help me figure out this? When I check out yourTTS colab program, I understand that we dont need to fine-tuning with new voice, we only calculate the speaker embedding of

Hi @hathubkhn, It is right, YourTTS can produce a new voice without the model being trained in the target voice. During the training, the YourTTS model is conditioned with speaker embeddings extracted from a speaker encoder (trained with thousand of speakers). The speaker encoder can generalize good embeddings for new speakers and the YourTTS model which is conditioned with these embeddings can generate new voices as well. If you are interested in understanding how this works @WeberJulian did a Youtube video about the YourTTS model and you can watch it here. In addition, have my talk at Nvidia’s AI submit that you can access here.

0reactions
hathubkhncommented, Dec 13, 2022

@hathubkhn Recently, we created a recipe that makes everything easier. If you like you can try to fine-tune the model with this recipe.

The recipe replicates the first experiment proposed in the YourTTS paper. The recipe replicates the single language training using the VCTK dataset (it downloads, resamples, and extracts the speaker embeddings automatically 😃). However, if you are interested in multilingual training, we have commented on parameters on the VitsArgs class instance that should be enabled for multilingual training: https://github.com/coqui-ai/TTS/blob/dev/recipes/vctk/yourtts/train_yourtts.py

Hello Edresson, I am doing on Korean languages, so could I use the way like:

  1. Training VITS with training data of one single woman speaker (~3 hours long) => save checkpoint
  2. Create a model with saved checkpoint, and re-trained with one men speaker (~17min long)
Read more comments on GitHub >

github_iconTop Results From Across the Web

Examining Zero-Shot Vulnerability Repair with Large ... - arXiv
repair those bugs? In this work, we examine the use of large language models (LLMs) for code (such as OpenAI's Codex and AI21's...
Read more >
Insect Identification by Deep Zero-shot Bayesian Learning
We develop a Bayesian deep learning method for zero-shot classification of species. The proposed approach forms a Bayesian hierarchy of species ...
Read more >
Zero-shot cost models for out-of-the-box learned cost prediction
In this paper, we introduce zero-shot cost models, which enable learned cost estimation that generalizes to unseen databases.
Read more >
Cross-lingual Few-Shot Learning on Unseen Languages
ing uses multilingual models and few-shot learning ... In the zero-shot in-context learning setting, the ... Buginese (bug). South Sulawesi.
Read more >
Revisiting Automated Program Repair via Zero-shot Learning
Furthermore, the edit variety of these learning-based techniques are limited to the available bug-fixes within their training datasets.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found