question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

"RuntimeError: [!] NaN loss with loss" on GlowTTS introduction example - mailabs dataset

See original GitHub issue

Describe the bug

After running the introduction example on a single speaker ljspeech, I switched to mailabs format where the speaker are derived from the folder structure. I am getting an exception after 75 steps.

I set this mixed_precision=False, per this related bug, but still observe this behavior.

To Reproduce

Run the tutorial with modified config.

config.json.txt

Expected behavior

No response

Logs

! Run is kept in /apps/tts/data/output/glow_tts_en-June-23-2022_02+23PM-00e67092
Traceback (most recent call last):
  File "/apps/tts/Trainer/trainer/trainer.py", line 1501, in fit
    self._fit()
  File "/apps/tts/Trainer/trainer/trainer.py", line 1485, in _fit
    self.train_epoch()
  File "/apps/tts/Trainer/trainer/trainer.py", line 1259, in train_epoch
    _, _ = self.train_step(batch, batch_num_steps, cur_step, loader_start_time)
  File "/apps/tts/Trainer/trainer/trainer.py", line 1101, in train_step
    num_optimizers=len(self.optimizer) if isinstance(self.optimizer, list) else 1,
  File "/apps/tts/Trainer/trainer/trainer.py", line 979, in _optimize
    outputs, loss_dict = self._model_train_step(batch, model, criterion)
  File "/apps/tts/Trainer/trainer/trainer.py", line 935, in _model_train_step
    return model.train_step(*input_args)
  File "/apps/tts/TTS/TTS/tts/models/glow_tts.py", line 425, in train_step
    text_lengths,
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/apps/tts/TTS/TTS/tts/layers/losses.py", line 494, in forward
    raise RuntimeError(f" [!] NaN loss with {key}.")
RuntimeError:  [!] NaN loss with loss.

Environment

{
    "CUDA": {
        "GPU": [
            "A100-SXM4-40GB"
        ],
        "available": true,
        "version": "11.5"
    },
    "Packages": {
        "PyTorch_debug": false,
        "PyTorch_version": "1.11.0+cu115",
        "TTS": "0.7.1",
        "numpy": "1.21.6"
    },
    "System": {
        "OS": "Linux",
        "architecture": [
            "64bit",
            "ELF"
        ],
        "processor": "x86_64",
        "python": "3.7.13",
        "version": "#77~18.04.1-Ubuntu SMP Thu Apr 7 21:38:47 UTC 2022"
    }
}

Additional context

No response

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:6 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
iprovalocommented, Jul 20, 2022

Is this duplicate #1750 ?

I think the two issues are related, but not a duplicate. This #1683 issue is about MAILABS format causing glow tts exception, while #1750 is about ljspeech Spanish dataset training for glow_tts avg_loss staying constant. The overlap for both of these issues is that once I switched the Spanish dataset to mixed precision, I observed the same exception as the one described for this #1683.

0reactions
stale[bot]commented, Aug 31, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. You might also look our discussion channels.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Glow TTS Avg Loss Not Decreasing - Spanish LJSpeech ...
Describe the bug When I train Glow TTS on LJSpeech Spanish set (angelina ... "RuntimeError: [!] NaN loss with loss" on GlowTTS introduction...
Read more >
'Model diverged with loss = NaN' , when number of classes ...
Looks like you are dealing with an imbalanced dataset and you are adding a small value when you have no classes in the...
Read more >
Re-training SSD-Mobilenet: gt_locations consist of nan values ...
I'm training SSD-Mobilenet Model on Bosch Small Traffic Lights Dataset. While training, my Avg Loss is reducing slowly but suddenly I'm getting ...
Read more >
TTS - bytemeta
"RuntimeError: [!] NaN loss with loss" on GlowTTS introduction example - mailabs dataset. hengway. hengway CLOSED · Updated 2 months ago ...
Read more >
[Bug] AttributeError: 'AttrDict' object has no attribute ... - Coder Social
... "tts_models/ru/ruslan/tacotron2-DDC" --out_path example.wav ... NaN loss with loss" on GlowTTS introduction example - mailabs dataset HOT 4; [Bug] Can't ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found