question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Bug] Glow-TTS, Trained in multi GPU get KeyError: 'avg_loss'

See original GitHub issue

Hi! When I follow to recipes train a glow-tts, I get this error

 ! Run is kept in /workspace/tts/glow_tts/glow_tts_chinese-September-20-2021_02+44PM-0000000
Traceback (most recent call last):
  File "/workspace/TTS/TTS/trainer.py", line 919, in fit
    self._fit()
  File "/workspace/TTS/TTS/trainer.py", line 904, in _fit
    self.train_epoch()
  File "/workspace/TTS/TTS/trainer.py", line 738, in train_epoch
    _, _ = self.train_step(batch, batch_num_steps, cur_step, loader_start_time)
  File "/workspace/TTS/TTS/trainer.py", line 685, in train_step
    target_avg_loss = self._pick_target_avg_loss(self.keep_avg_train)
  File "/workspace/TTS/TTS/trainer.py", line 957, in _pick_target_avg_loss
    target_avg_loss = keep_avg_target["avg_loss"]
  File "/workspace/TTS/TTS/utils/generic_utils.py", line 155, in __getitem__
    return self.avg_values[key]
KeyError: 'avg_loss'

and current_lr always 0.00000 image

My train file : train.py.txt

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:16 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
Lizh-TWcommented, Nov 6, 2021

My Environment:

  • Docker : pytorch:1.9.0-cuda11.1-cudnn8-runtime
  • PyTorch and TensorFlow version : 1.9.0 & 2.5.0
  • Python version : 3.7.10
  • CUDA/cuDNN version 11.1/8
  • GPU model and memory : GeForce RTX 3090 x2
0reactions
erogolcommented, Nov 30, 2021

@patdflynn so what is the problem? small learning rate?
@skol101 how is it the same?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Issues with training with multiple gpus. #1296 - GitHub
During the reimplementatio of recognition/TSM, when i parse the gpu args as 2, i get the error AssertionError: MMDataParallel only supports ...
Read more >
Deep Learning with MATLAB on Multiple GPUs - MathWorks
MATLAB ® supports training a single deep neural network using multiple GPUs in parallel. By using parallel workers with GPUs, you can train...
Read more >
Best Practices — NVIDIA NeMo
It involves defining, building, and training several models in specific domains; experimenting several times to get high accuracy, fine tuning on multiple tasks ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found