question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Bug: Dynamic Convolution Attention fails in `mixed_precision` training.

See original GitHub issue

Describe the bug Dynamic Convolutional Attention fails in mixed_precision training and ultimately causes NaN error.

To Reproduce Steps to reproduce the behavior:

  1. set mixed_precision=True in config.json.
  2. set dynamic_convolution=True in config.json.
  3. start training a tacotron or tacotron2 model.
  4. On TB initially you observe broken attention alignment.
  5. Ultimately loss becomes NaN.

Expected behavior The model should learn the alignment after 10K iterations with no NaN loss as it does in full precision training.

Environment (please complete the following information):

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 20.04
  • PyTorch or TensorFlow version (use command below): Torch 1.8.0
  • Python version: 3.8
  • CUDA/cuDNN version: 11.2
  • GPU model and memory: 1080Ti
  • Exact command to reproduce:

Additional context Add any other context about the problem here.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:12 (10 by maintainers)

github_iconTop GitHub Comments

1reaction
erogolcommented, Jul 8, 2021

using APEX backend with the new API seemingly helps

1reaction
Sadam1195commented, Apr 28, 2021

I am not sure if it helps but I was getting the same error I fixed the issue by setting r = 6 "gradual_training": [[0, 6, 64], [15000, 4, 64], [30000, 2, 32]] and setting ddc_r = 6 removing all odd values of r and ddc_r helped my case but alignments were still on and off for the most of time during training.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Train With Mixed Precision - NVIDIA Documentation Center
These examples focus on achieving the best performance and convergence from NVIDIA Volta Tensor Cores by using the latest deep learning example ...
Read more >
Attention Over Convolution Kernels - CVF Open Access
Dynamic convolutional neural networks (denoted as DY-. CNNs) are more difficult to train, as they require joint optimization of all convolution kernels and ......
Read more >
arXiv:1912.03458v2 [cs.CV] 31 Mar 2020
We inspect if DY-CNN is dynamic, using DY-. MobileNetV2 ×0.5, which has K = 4 kernels per layer and is trained by using...
Read more >
Mixed Precision Training - arXiv Vanity
There have been a number of publications on training Convolutional Neural ... Table 3: Character Error Rate (CER) using mixed precision training for...
Read more >
INTRODUCTION TO MIXED PRECISION TRAINING - NVlabs
Tensor Cores for mixed precision training and inference ... large sums of values, e.g. in linear layers and convolutions, can be too big...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found