question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Error in training Capacitron

See original GitHub issue

Describe the bug

raceback (most recent call last): File “/opt/conda/envs/coqui/lib/python3.8/site-packages/trainer-0.0.14-py3.8.egg/trainer/trainer.py”, line 1533, in fit self._fit() File “/opt/conda/envs/coqui/lib/python3.8/site-packages/trainer-0.0.14-py3.8.egg/trainer/trainer.py”, line 1517, in _fit self.train_epoch() File “/opt/conda/envs/coqui/lib/python3.8/site-packages/trainer-0.0.14-py3.8.egg/trainer/trainer.py”, line 1282, in train_epoch _, _ = self.train_step(batch, batch_num_steps, cur_step, loader_start_time) File “/opt/conda/envs/coqui/lib/python3.8/site-packages/trainer-0.0.14-py3.8.egg/trainer/trainer.py”, line 1114, in train_step outputs, loss_dict_new, step_time = self._optimize( File “/opt/conda/envs/coqui/lib/python3.8/site-packages/trainer-0.0.14-py3.8.egg/trainer/trainer.py”, line 998, in _optimize outputs, loss_dict = self._model_train_step(batch, model, criterion) File “/opt/conda/envs/coqui/lib/python3.8/site-packages/trainer-0.0.14-py3.8.egg/trainer/trainer.py”, line 954, in _model_train_step return model.train_step(*input_args) File “/home/manmay/TTS/TTS/tts/models/tacotron2.py”, line 352, in train_step outputs = self.forward(text_input, text_lengths, mel_input, mel_lengths, aux_input) File “/home/manmay/TTS/TTS/tts/models/tacotron2.py”, line 216, in forward encoder_outputs, *capacitron_vae_outputs = self.compute_capacitron_VAE_embedding( File “/home/manmay/TTS/TTS/tts/models/base_tacotron.py”, line 254, in compute_capacitron_VAE_embedding (VAE_outputs, posterior_distribution, prior_distribution, capacitron_beta,) = self.capacitron_vae_layer( File “/opt/conda/envs/coqui/lib/python3.8/site-packages/torch/nn/modules/module.py”, line 1130, in _call_impl return forward_call(*input, **kwargs) File “/home/manmay/TTS/TTS/tts/layers/tacotron/capacitron_layers.py”, line 67, in forward self.approximate_posterior_distribution = MVN(mu, torch.diag_embed(sigma)) File “/opt/conda/envs/coqui/lib/python3.8/site-packages/torch/distributions/multivariate_normal.py”, line 146, in init super(MultivariateNormal, self).init(batch_shape, event_shape, validate_args=validate_args) File “/opt/conda/envs/coqui/lib/python3.8/site-packages/torch/distributions/distribution.py”, line 55, in init raise ValueError( ValueError: Expected parameter loc (Tensor of shape (128, 128)) of distribution MultivariateNormal(loc: torch.Size([128, 128]), covariance_matrix: torch.Size([128, 128, 128])) to satisfy the constraint IndependentConstraint(Real(), 1), but found invalid values: tensor([[nan, nan, nan, …, nan, nan, nan], [nan, nan, nan, …, nan, nan, nan], [nan, nan, nan, …, nan, nan, nan], …, [nan, nan, nan, …, nan, nan, nan], [nan, nan, nan, …, nan, nan, nan], [nan, nan, nan, …, nan, nan, nan]], grad_fn=<ExpandBackward0>)

To Reproduce

config.txt

Expected behavior

No response

Logs

No response

Environment

{
    "CUDA": {
        "GPU": [
            "NVIDIA A100-SXM4-40GB"
        ],
        "available": true,
        "version": "10.2"
    },
    "Packages": {
        "PyTorch_debug": false,
        "PyTorch_version": "1.12.0+cu102",
        "TTS": "0.6.2",
        "numpy": "1.21.6"
    },
    "System": {
        "OS": "Linux",
        "architecture": [
            "64bit",
            ""
        ],
        "processor": "",
        "python": "3.7.12",
        "version": "#1 SMP Debian 4.19.249-2 (2022-06-30)"
    }
}

Additional context

No response

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:10 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
WeberJuliancommented, Sep 19, 2022

Training capacitron is hard since it’s pretty unstable. Try using the latest recipe since it improved stability (at least for alignments), you can find it on the latest TTS version.

0reactions
erogolcommented, Sep 19, 2022

@WeberJulian can you take a look into that?

Read more comments on GitHub >

github_iconTop Results From Across the Web

coqui-ai/community
I'm trying to train a Capacitron model and keep running into this error RuntimeError: Expected all tensors to be on the same device,...
Read more >
TTS 0.10.0 documentation
Detailed training logs on the terminal and Tensorboard. Support for Multi-speaker TTS. Efficient, flexible, lightweight but feature complete Trainer API .
Read more >
How to obtain the training error in svm of Scikit-learn?
I am trying to do a plot of error of the train set and test set against the number of training data used...
Read more >
Effective Use of Variational Embedding Capacity in ...
For multi-speaker models, Capacitron is able to preserve target speaker identity ... parameters being interleaved during training to the description of eq.
Read more >
arXiv:1906.03402v3 [cs.CL] 25 Oct 2019
multi-speaker models, Capacitron is able to preserve target speaker ... samples of z that the decoder sees during training will be very ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found