[Bug] IndexError while training new VITS LJSpeech recipe
See original GitHub issueš Description
Training crashed while training the new VITS LJSpeech recipe, here is the output:
--> STEP: 395/405 -- GLOBAL_STEP: 246025
| > loss_disc: nan (nan)
| > loss_disc_real_0: nan (nan)
| > loss_disc_real_1: nan (nan)
| > loss_disc_real_2: nan (nan)
| > loss_disc_real_3: nan (nan)
| > loss_disc_real_4: nan (nan)
| > loss_disc_real_5: nan (nan)
| > amp_scaler: 0.00000 (0.00000)
| > loss_0: nan (nan)
| > grad_norm_0: 0.00000 (0.00000)
| > loss_gen: nan (nan)
| > loss_kl: nan (nan)
| > loss_feat: nan (nan)
| > loss_mel: 21.53698 (21.33673)
| > loss_duration: nan (nan)
| > loss_1: nan (nan)
| > grad_norm_1: 0.00000 (0.07407)
| > current_lr_0: 0.00019
| > current_lr_1: 0.00019
| > step_time: 0.75520 (0.56876)
| > loader_time: 0.05890 (0.04195)
! Run is kept in /home/fijipants/repo/coqui-0.6.1/runs/vits_ljspeech-March-07-2022_11+31AM-0cf3265a
Traceback (most recent call last):
File "/home/fijipants/miniconda3/envs/coqui-0.6.1/lib/python3.7/site-packages/trainer/trainer.py", line 1403, in fit
self._fit()
File "/home/fijipants/miniconda3/envs/coqui-0.6.1/lib/python3.7/site-packages/trainer/trainer.py", line 1387, in _fit
self.train_epoch()
File "/home/fijipants/miniconda3/envs/coqui-0.6.1/lib/python3.7/site-packages/trainer/trainer.py", line 1167, in train_epoch
_, _ = self.train_step(batch, batch_num_steps, cur_step, loader_start_time)
File "/home/fijipants/miniconda3/envs/coqui-0.6.1/lib/python3.7/site-packages/trainer/trainer.py", line 1031, in train_step
step_optimizer=step_optimizer,
File "/home/fijipants/miniconda3/envs/coqui-0.6.1/lib/python3.7/site-packages/trainer/trainer.py", line 888, in _optimize
outputs, loss_dict = self._model_train_step(batch, model, criterion, optimizer_idx=optimizer_idx)
File "/home/fijipants/miniconda3/envs/coqui-0.6.1/lib/python3.7/site-packages/trainer/trainer.py", line 846, in _model_train_step
return model.train_step(*input_args)
File "/home/fijipants/miniconda3/envs/coqui-0.6.1/lib/python3.7/site-packages/TTS/tts/models/vits.py", line 1062, in train_step
aux_input={"d_vectors": d_vectors, "speaker_ids": speaker_ids, "language_ids": language_ids},
File "/home/fijipants/miniconda3/envs/coqui-0.6.1/lib/python3.7/site-packages/TTS/tts/models/vits.py", line 875, in forward
outputs, attn = self.forward_mas(outputs, z_p, m_p, logs_p, x, x_mask, y_mask, g=g, lang_emb=lang_emb)
File "/home/fijipants/miniconda3/envs/coqui-0.6.1/lib/python3.7/site-packages/TTS/tts/models/vits.py", line 784, in forward_mas
attn = maximum_path(logp, attn_mask.squeeze(1)).unsqueeze(1).detach() # [b, 1, t, t']
File "/home/fijipants/miniconda3/envs/coqui-0.6.1/lib/python3.7/site-packages/TTS/tts/utils/helpers.py", line 177, in maximum_path
return maximum_path_numpy(value, mask)
File "/home/fijipants/miniconda3/envs/coqui-0.6.1/lib/python3.7/site-packages/TTS/tts/utils/helpers.py", line 234, in maximum_path_numpy
path[index_range, index, j] = 1
IndexError: index -329 is out of bounds for axis 1 with size 328
To Reproduce
- Modify the VITS LJSpeech recipeās
dataset_config
to point to your LJSpeech folder. - Run the training with
CUDA_VISIBLE_DEVICES=0
- Wait 246k steps and pray
Expected behavior
It doesnāt crash
Environment
{
"CUDA": {
"GPU": [
"NVIDIA GeForce RTX 3090",
"NVIDIA GeForce RTX 3090"
],
"available": true,
"version": "11.3"
},
"Packages": {
"PyTorch_debug": false,
"PyTorch_version": "1.10.2",
"TTS": "0.6.1",
"numpy": "1.21.2"
},
"System": {
"OS": "Linux",
"architecture": [
"64bit",
""
],
"processor": "x86_64",
"python": "3.7.11",
"version": "#202202230823 SMP PREEMPT Wed Feb 23 14:53:24 UTC 2022"
}
}
Additional context
Issue Analytics
- State:
- Created 2 years ago
- Comments:8 (5 by maintainers)
Top Results From Across the Web
[Bug] Not able to replicate quality of the pretrained VITS ...
I tried training on the VITS LJSpeech recipe all the way through, but its quality is significantly worse than the pretrained model.
Read more >Tutorial For Nervous Beginners - TTS 0.10.0 documentation
Training a tts Model#. A breakdown of a simple script that trains a GlowTTS model on the LJspeech dataset. See the comments for...
Read more >Get torch pad AssertionError on VITS training (TTS)
I was using VITS training, but get follow exception while torch padding: AssertionError: 4D tensors expect 4 values for padding.
Read more >Training 2 New Custom Datasets with TTS-recipes, need ...
Hi, we have recorded two datasets (male and female) for URDU language. Both the datasets are in LJSpeech format with total length of...
Read more >tts Changelog - PyUp.io
Training recipes for thorsten dataset by noranraskin in ... Minor bug fixes on VITS/YourTTS and inference by Edresson inĀ ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Itās on the new release (0.6.1), and it trained pretty well before 246k but started to get very weird around 246k.
Here's some samples
230k:
https://user-images.githubusercontent.com/88913682/158008781-bd03c25f-439a-4df3-a9db-5b2f8ee5013b.mp4
244k:
https://user-images.githubusercontent.com/88913682/158008786-3f1545cc-c169-4108-ab10-d65f84843cbe.mp4
245k:
https://user-images.githubusercontent.com/88913682/158008789-fe7cb1b0-c5fb-4110-bd2c-f0262a78b9b6.mp4
At least itās much better than the results I had for v0.5.0 (you can see them in #1309.)
I tried resuming the current training but it only got worse, and around 260k all the values became NaN and the audio became a blaringly loud noise. Iāve since started a new training from scratch which hopefully wonāt run into this issue, and if it does, Iāll make another bug report.
How did you get rid of the background noise in the navy-400k-fp32.mp4 example? Because the navy-400k.mp4 example has it