[Bug] Glow-TTS, Trained in multi GPU get KeyError: 'avg_loss'
See original GitHub issueHi! When I follow to recipes train a glow-tts, I get this error
! Run is kept in /workspace/tts/glow_tts/glow_tts_chinese-September-20-2021_02+44PM-0000000
Traceback (most recent call last):
File "/workspace/TTS/TTS/trainer.py", line 919, in fit
self._fit()
File "/workspace/TTS/TTS/trainer.py", line 904, in _fit
self.train_epoch()
File "/workspace/TTS/TTS/trainer.py", line 738, in train_epoch
_, _ = self.train_step(batch, batch_num_steps, cur_step, loader_start_time)
File "/workspace/TTS/TTS/trainer.py", line 685, in train_step
target_avg_loss = self._pick_target_avg_loss(self.keep_avg_train)
File "/workspace/TTS/TTS/trainer.py", line 957, in _pick_target_avg_loss
target_avg_loss = keep_avg_target["avg_loss"]
File "/workspace/TTS/TTS/utils/generic_utils.py", line 155, in __getitem__
return self.avg_values[key]
KeyError: 'avg_loss'
and current_lr always 0.00000
My train file : train.py.txt
Issue Analytics
- State:
- Created 2 years ago
- Comments:16 (6 by maintainers)
Top Results From Across the Web
Issues with training with multiple gpus. #1296 - GitHub
During the reimplementatio of recognition/TSM, when i parse the gpu args as 2, i get the error AssertionError: MMDataParallel only supports ...
Read more >Deep Learning with MATLAB on Multiple GPUs - MathWorks
MATLAB ® supports training a single deep neural network using multiple GPUs in parallel. By using parallel workers with GPUs, you can train...
Read more >Best Practices — NVIDIA NeMo
It involves defining, building, and training several models in specific domains; experimenting several times to get high accuracy, fine tuning on multiple tasks ......
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
My Environment:
@patdflynn so what is the problem? small learning rate?
@skol101 how is it the same?