question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. ItĀ collects links to all the places you might be looking at while hunting down a tough bug.

And, if youā€™re still stuck at the end, weā€™re happy to hop on a call to see how we can help out.

[Bug] IndexError while training new VITS LJSpeech recipe

See original GitHub issue

šŸ› Description

Training crashed while training the new VITS LJSpeech recipe, here is the output:

   --> STEP: 395/405 -- GLOBAL_STEP: 246025
     | > loss_disc: nan  (nan)
     | > loss_disc_real_0: nan  (nan)
     | > loss_disc_real_1: nan  (nan)
     | > loss_disc_real_2: nan  (nan)
     | > loss_disc_real_3: nan  (nan)
     | > loss_disc_real_4: nan  (nan)
     | > loss_disc_real_5: nan  (nan)
     | > amp_scaler: 0.00000  (0.00000)
     | > loss_0: nan  (nan)
     | > grad_norm_0: 0.00000  (0.00000)
     | > loss_gen: nan  (nan)
     | > loss_kl: nan  (nan)
     | > loss_feat: nan  (nan)
     | > loss_mel: 21.53698  (21.33673)
     | > loss_duration: nan  (nan)
     | > loss_1: nan  (nan)
     | > grad_norm_1: 0.00000  (0.07407)
     | > current_lr_0: 0.00019
     | > current_lr_1: 0.00019
     | > step_time: 0.75520  (0.56876)
     | > loader_time: 0.05890  (0.04195)

 ! Run is kept in /home/fijipants/repo/coqui-0.6.1/runs/vits_ljspeech-March-07-2022_11+31AM-0cf3265a
Traceback (most recent call last):
  File "/home/fijipants/miniconda3/envs/coqui-0.6.1/lib/python3.7/site-packages/trainer/trainer.py", line 1403, in fit
    self._fit()
  File "/home/fijipants/miniconda3/envs/coqui-0.6.1/lib/python3.7/site-packages/trainer/trainer.py", line 1387, in _fit
    self.train_epoch()
  File "/home/fijipants/miniconda3/envs/coqui-0.6.1/lib/python3.7/site-packages/trainer/trainer.py", line 1167, in train_epoch
    _, _ = self.train_step(batch, batch_num_steps, cur_step, loader_start_time)
  File "/home/fijipants/miniconda3/envs/coqui-0.6.1/lib/python3.7/site-packages/trainer/trainer.py", line 1031, in train_step
    step_optimizer=step_optimizer,
  File "/home/fijipants/miniconda3/envs/coqui-0.6.1/lib/python3.7/site-packages/trainer/trainer.py", line 888, in _optimize
    outputs, loss_dict = self._model_train_step(batch, model, criterion, optimizer_idx=optimizer_idx)
  File "/home/fijipants/miniconda3/envs/coqui-0.6.1/lib/python3.7/site-packages/trainer/trainer.py", line 846, in _model_train_step
    return model.train_step(*input_args)
  File "/home/fijipants/miniconda3/envs/coqui-0.6.1/lib/python3.7/site-packages/TTS/tts/models/vits.py", line 1062, in train_step
    aux_input={"d_vectors": d_vectors, "speaker_ids": speaker_ids, "language_ids": language_ids},
  File "/home/fijipants/miniconda3/envs/coqui-0.6.1/lib/python3.7/site-packages/TTS/tts/models/vits.py", line 875, in forward
    outputs, attn = self.forward_mas(outputs, z_p, m_p, logs_p, x, x_mask, y_mask, g=g, lang_emb=lang_emb)
  File "/home/fijipants/miniconda3/envs/coqui-0.6.1/lib/python3.7/site-packages/TTS/tts/models/vits.py", line 784, in forward_mas
    attn = maximum_path(logp, attn_mask.squeeze(1)).unsqueeze(1).detach()  # [b, 1, t, t']
  File "/home/fijipants/miniconda3/envs/coqui-0.6.1/lib/python3.7/site-packages/TTS/tts/utils/helpers.py", line 177, in maximum_path
    return maximum_path_numpy(value, mask)
  File "/home/fijipants/miniconda3/envs/coqui-0.6.1/lib/python3.7/site-packages/TTS/tts/utils/helpers.py", line 234, in maximum_path_numpy
    path[index_range, index, j] = 1
IndexError: index -329 is out of bounds for axis 1 with size 328

To Reproduce

  • Modify the VITS LJSpeech recipeā€™s dataset_config to point to your LJSpeech folder.
  • Run the training with CUDA_VISIBLE_DEVICES=0
  • Wait 246k steps and pray

Expected behavior

It doesnā€™t crash

Environment

{
    "CUDA": {
        "GPU": [
            "NVIDIA GeForce RTX 3090",
            "NVIDIA GeForce RTX 3090"
        ],
        "available": true,
        "version": "11.3"
    },
    "Packages": {
        "PyTorch_debug": false,
        "PyTorch_version": "1.10.2",
        "TTS": "0.6.1",
        "numpy": "1.21.2"
    },
    "System": {
        "OS": "Linux",
        "architecture": [
            "64bit",
            ""
        ],
        "processor": "x86_64",
        "python": "3.7.11",
        "version": "#202202230823 SMP PREEMPT Wed Feb 23 14:53:24 UTC 2022"
    }
}

Additional context

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:8 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
fijipantscommented, Mar 12, 2022

@fijipants Are you running it on the new release (0.6.1)? Or the old one? Did the model train okay before 246k? Iā€™m just curious because Iā€™m considering upgrading TTS to the new release.

Itā€™s on the new release (0.6.1), and it trained pretty well before 246k but started to get very weird around 246k.

Here's some samples

230k:

https://user-images.githubusercontent.com/88913682/158008781-bd03c25f-439a-4df3-a9db-5b2f8ee5013b.mp4

244k:

https://user-images.githubusercontent.com/88913682/158008786-3f1545cc-c169-4108-ab10-d65f84843cbe.mp4

245k:

https://user-images.githubusercontent.com/88913682/158008789-fe7cb1b0-c5fb-4110-bd2c-f0262a78b9b6.mp4

At least itā€™s much better than the results I had for v0.5.0 (you can see them in #1309.)

I tried resuming the current training but it only got worse, and around 260k all the values became NaN and the audio became a blaringly loud noise. Iā€™ve since started a new training from scratch which hopefully wonā€™t run into this issue, and if it does, Iā€™ll make another bug report.

0reactions
e0xextazycommented, Mar 28, 2022

To wrap this up:

How did you get rid of the background noise in the navy-400k-fp32.mp4 example? Because the navy-400k.mp4 example has it

Read more comments on GitHub >

github_iconTop Results From Across the Web

[Bug] Not able to replicate quality of the pretrained VITS ...
I tried training on the VITS LJSpeech recipe all the way through, but its quality is significantly worse than the pretrained model.
Read more >
Tutorial For Nervous Beginners - TTS 0.10.0 documentation
Training a tts Model#. A breakdown of a simple script that trains a GlowTTS model on the LJspeech dataset. See the comments for...
Read more >
Get torch pad AssertionError on VITS training (TTS)
I was using VITS training, but get follow exception while torch padding: AssertionError: 4D tensors expect 4 values for padding.
Read more >
Training 2 New Custom Datasets with TTS-recipes, need ...
Hi, we have recorded two datasets (male and female) for URDU language. Both the datasets are in LJSpeech format with total length of...
Read more >
tts Changelog - PyUp.io
Training recipes for thorsten dataset by noranraskin in ... Minor bug fixes on VITS/YourTTS and inference by Edresson inĀ ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found