Error in training Capacitron
See original GitHub issueDescribe the bug
raceback (most recent call last): File “/opt/conda/envs/coqui/lib/python3.8/site-packages/trainer-0.0.14-py3.8.egg/trainer/trainer.py”, line 1533, in fit self._fit() File “/opt/conda/envs/coqui/lib/python3.8/site-packages/trainer-0.0.14-py3.8.egg/trainer/trainer.py”, line 1517, in _fit self.train_epoch() File “/opt/conda/envs/coqui/lib/python3.8/site-packages/trainer-0.0.14-py3.8.egg/trainer/trainer.py”, line 1282, in train_epoch _, _ = self.train_step(batch, batch_num_steps, cur_step, loader_start_time) File “/opt/conda/envs/coqui/lib/python3.8/site-packages/trainer-0.0.14-py3.8.egg/trainer/trainer.py”, line 1114, in train_step outputs, loss_dict_new, step_time = self._optimize( File “/opt/conda/envs/coqui/lib/python3.8/site-packages/trainer-0.0.14-py3.8.egg/trainer/trainer.py”, line 998, in _optimize outputs, loss_dict = self._model_train_step(batch, model, criterion) File “/opt/conda/envs/coqui/lib/python3.8/site-packages/trainer-0.0.14-py3.8.egg/trainer/trainer.py”, line 954, in _model_train_step return model.train_step(*input_args) File “/home/manmay/TTS/TTS/tts/models/tacotron2.py”, line 352, in train_step outputs = self.forward(text_input, text_lengths, mel_input, mel_lengths, aux_input) File “/home/manmay/TTS/TTS/tts/models/tacotron2.py”, line 216, in forward encoder_outputs, *capacitron_vae_outputs = self.compute_capacitron_VAE_embedding( File “/home/manmay/TTS/TTS/tts/models/base_tacotron.py”, line 254, in compute_capacitron_VAE_embedding (VAE_outputs, posterior_distribution, prior_distribution, capacitron_beta,) = self.capacitron_vae_layer( File “/opt/conda/envs/coqui/lib/python3.8/site-packages/torch/nn/modules/module.py”, line 1130, in _call_impl return forward_call(*input, **kwargs) File “/home/manmay/TTS/TTS/tts/layers/tacotron/capacitron_layers.py”, line 67, in forward self.approximate_posterior_distribution = MVN(mu, torch.diag_embed(sigma)) File “/opt/conda/envs/coqui/lib/python3.8/site-packages/torch/distributions/multivariate_normal.py”, line 146, in init super(MultivariateNormal, self).init(batch_shape, event_shape, validate_args=validate_args) File “/opt/conda/envs/coqui/lib/python3.8/site-packages/torch/distributions/distribution.py”, line 55, in init raise ValueError( ValueError: Expected parameter loc (Tensor of shape (128, 128)) of distribution MultivariateNormal(loc: torch.Size([128, 128]), covariance_matrix: torch.Size([128, 128, 128])) to satisfy the constraint IndependentConstraint(Real(), 1), but found invalid values: tensor([[nan, nan, nan, …, nan, nan, nan], [nan, nan, nan, …, nan, nan, nan], [nan, nan, nan, …, nan, nan, nan], …, [nan, nan, nan, …, nan, nan, nan], [nan, nan, nan, …, nan, nan, nan], [nan, nan, nan, …, nan, nan, nan]], grad_fn=<ExpandBackward0>)
To Reproduce
Expected behavior
No response
Logs
No response
Environment
{
"CUDA": {
"GPU": [
"NVIDIA A100-SXM4-40GB"
],
"available": true,
"version": "10.2"
},
"Packages": {
"PyTorch_debug": false,
"PyTorch_version": "1.12.0+cu102",
"TTS": "0.6.2",
"numpy": "1.21.6"
},
"System": {
"OS": "Linux",
"architecture": [
"64bit",
""
],
"processor": "",
"python": "3.7.12",
"version": "#1 SMP Debian 4.19.249-2 (2022-06-30)"
}
}
Additional context
No response
Issue Analytics
- State:
- Created a year ago
- Comments:10 (6 by maintainers)
Top GitHub Comments
Training capacitron is hard since it’s pretty unstable. Try using the latest recipe since it improved stability (at least for alignments), you can find it on the latest TTS version.
@WeberJulian can you take a look into that?