question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Bug] Multispeaker VITS training does not work in direct python but does with config.json

See original GitHub issue

Describe the bug

Hi,

I want to train VITS model with mutliple speakers and external embeddings (aka d_vectors). So I provided VitsArgs in a recipe :

vits_args = VitsArgs(
    use_language_embedding=False,
    embedded_language_dim=1,
    use_speaker_embedding=False,
    num_languages=1,
    use_sdp=False,
    #Those 3 properties also have to be repeated in the config section (see https://github.com/coqui-ai/TTS/issues/1454#issuecomment-1081843205)
    use_d_vector_file=True, 
    d_vector_file="/home/Caraduf/Models/d_vector_file_4_Voices.json",
    d_vector_dim=512
)

and repeated the same 3 lines in the VitsConfig section as explained here :

config = VitsConfig(
model_args = vits_args,
    use_d_vector_file=True, 
    d_vector_file="/home/Caraduf/Models/d_vector_file_4_Voices.json",
    d_vector_dim=512

I then ran the training python3 my_multispeaker_vits_training.py but it failed due to

 line 303, in _conv_forward
    return F.conv1d(input, weight, bias, self.stride,
 TypeError: conv1d() received an invalid combination of arguments - got (NoneType, Parameter,    Parameter, tuple, tuple, tuple, int), but expected one of:
  * (Tensor input, Tensor weight, Tensor bias, tuple of ints stride, tuple of ints padding, tuple of ints dilation, int groups)
      didn't match because some of the arguments have invalid types: (NoneType, Parameter, Parameter,  tuple, tuple, tuple, int)
  * (Tensor input, Tensor weight, Tensor bias, tuple of ints stride, str padding, tuple of ints dilation, int groups)
      didn't match because some of the arguments have invalid types: (NoneType, Parameter, Parameter, tuple, tuple, tuple, int)

This is also described by @harmlessman in this comment.

But if I run the train_tts.py --config_path path/to/the/jus/previously/created/config.json then the multispeaker training with external embeddings runs fine.

To Reproduce

See above.

Expected behavior

Launching a multi speaker training via “direct python way” providing the external speaker embeddings should work directly without the need to use the generated config.json file via generic python3 training_tts.py --config_path X/Y/Z/config.json.

Logs

No response

Environment

CoquiTTS 0.8.0

Additional context

No response

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:5

github_iconTop GitHub Comments

1reaction
shigabeevcommented, Oct 19, 2022

Hi! Did you find a solution for an issue? I’m facing the same one unfortunately.

0reactions
Ca-ressemble-a-du-fakecommented, Nov 23, 2022

That was what I was missing ! Thanks for sharing your solution @lokmantsui this is much better than my workaround !

Read more comments on GitHub >

github_iconTop Results From Across the Web

Get torch pad AssertionError on VITS training (TTS)
I was using VITS training, but get follow exception while torch padding: AssertionError: 4D tensors expect 4 values for padding.
Read more >
NidhiBharani/TTS: - a deep learning toolkit for Text-to-Speech, battle ...
a deep learning toolkit for Text-to-Speech, battle-tested in research and production ... Use it, if attention does not work well with your dataset....
Read more >
Proceedings of the 13th Language Resources and Evaluation ...
AMR graphs in the corpora do not change, “translation of the AMR corpus” means ... for each language (both, for training and validation)....
Read more >
Untitled
2008 number 1 hits, South park dindes folles, Trampolim do forte filme online, ... Tajch naucny chodnik, Paia theremax troubleshooting, Wymiana nagrzewnicy ...
Read more >
VITS - TTS 0.10.0 documentation
It does not require external alignment annotations and learns the ... YourTTS is a multi-speaker and multi-lingual TTS model that can perform voice ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found