Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Bug] Multispeaker VITS training does not work in direct python but does with config.json

See original GitHub issue

Describe the bug

Hi,

I want to train VITS model with mutliple speakers and external embeddings (aka d_vectors). So I provided VitsArgs in a recipe :

vits_args = VitsArgs(
    use_language_embedding=False,
    embedded_language_dim=1,
    use_speaker_embedding=False,
    num_languages=1,
    use_sdp=False,
    #Those 3 properties also have to be repeated in the config section (see https://github.com/coqui-ai/TTS/issues/1454#issuecomment-1081843205)
    use_d_vector_file=True, 
    d_vector_file="/home/Caraduf/Models/d_vector_file_4_Voices.json",
    d_vector_dim=512
)

and repeated the same 3 lines in the VitsConfig section as explained here :

config = VitsConfig(
model_args = vits_args,
    use_d_vector_file=True, 
    d_vector_file="/home/Caraduf/Models/d_vector_file_4_Voices.json",
    d_vector_dim=512

I then ran the training python3 my_multispeaker_vits_training.py but it failed due to

 line 303, in _conv_forward
    return F.conv1d(input, weight, bias, self.stride,
 TypeError: conv1d() received an invalid combination of arguments - got (NoneType, Parameter,    Parameter, tuple, tuple, tuple, int), but expected one of:
  * (Tensor input, Tensor weight, Tensor bias, tuple of ints stride, tuple of ints padding, tuple of ints dilation, int groups)
      didn't match because some of the arguments have invalid types: (NoneType, Parameter, Parameter,  tuple, tuple, tuple, int)
  * (Tensor input, Tensor weight, Tensor bias, tuple of ints stride, str padding, tuple of ints dilation, int groups)
      didn't match because some of the arguments have invalid types: (NoneType, Parameter, Parameter, tuple, tuple, tuple, int)

This is also described by @harmlessman in this comment.

But if I run the train_tts.py --config_path path/to/the/jus/previously/created/config.json then the multispeaker training with external embeddings runs fine.

To Reproduce

See above.

Expected behavior

Launching a multi speaker training via “direct python way” providing the external speaker embeddings should work directly without the need to use the generated config.json file via generic python3 training_tts.py --config_path X/Y/Z/config.json.

Logs

No response

Environment

CoquiTTS 0.8.0

Additional context

No response

Issue Analytics

State:
Created a year ago
Comments:5

Top GitHub Comments

1reaction

shigabeevcommented, Oct 19, 2022

Hi! Did you find a solution for an issue? I’m facing the same one unfortunately.

0reactions

Ca-ressemble-a-du-fakecommented, Nov 23, 2022

That was what I was missing ! Thanks for sharing your solution @lokmantsui this is much better than my workaround !

Top Results From Across the Web

Get torch pad AssertionError on VITS training (TTS)

I was using VITS training, but get follow exception while torch padding: AssertionError: 4D tensors expect 4 values for padding.

NidhiBharani/TTS: - a deep learning toolkit for Text-to-Speech, battle ...

a deep learning toolkit for Text-to-Speech, battle-tested in research and production ... Use it, if attention does not work well with your dataset....

Proceedings of the 13th Language Resources and Evaluation ...

AMR graphs in the corpora do not change, “translation of the AMR corpus” means ... for each language (both, for training and validation)....

Untitled

2008 number 1 hits, South park dindes folles, Trampolim do forte filme online, ... Tajch naucny chodnik, Paia theremax troubleshooting, Wymiana nagrzewnicy ...

VITS - TTS 0.10.0 documentation

It does not require external alignment annotations and learns the ... YourTTS is a multi-speaker and multi-lingual TTS model that can perform voice ......