Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Bug: Dynamic Convolution Attention fails in `mixed_precision` training.

See original GitHub issue

Describe the bug Dynamic Convolutional Attention fails in mixed_precision training and ultimately causes NaN error.

To Reproduce Steps to reproduce the behavior:

set mixed_precision=True in config.json.
set dynamic_convolution=True in config.json.
start training a tacotron or tacotron2 model.
On TB initially you observe broken attention alignment.
Ultimately loss becomes NaN.

Expected behavior The model should learn the alignment after 10K iterations with no NaN loss as it does in full precision training.

Environment (please complete the following information):

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 20.04
PyTorch or TensorFlow version (use command below): Torch 1.8.0
Python version: 3.8
CUDA/cuDNN version: 11.2
GPU model and memory: 1080Ti
Exact command to reproduce:

Additional context Add any other context about the problem here.

Issue Analytics

State:
Created 2 years ago
Comments:12 (10 by maintainers)

Top GitHub Comments

1reaction

erogolcommented, Jul 8, 2021

using APEX backend with the new API seemingly helps

1reaction

Sadam1195commented, Apr 28, 2021

I am not sure if it helps but I was getting the same error I fixed the issue by setting r = 6 "gradual_training": [[0, 6, 64], [15000, 4, 64], [30000, 2, 32]] and setting ddc_r = 6 removing all odd values of r and ddc_r helped my case but alignments were still on and off for the most of time during training.

Top Results From Across the Web

Train With Mixed Precision - NVIDIA Documentation Center

These examples focus on achieving the best performance and convergence from NVIDIA Volta Tensor Cores by using the latest deep learning example ...

Attention Over Convolution Kernels - CVF Open Access

Dynamic convolutional neural networks (denoted as DY-. CNNs) are more difficult to train, as they require joint optimization of all convolution kernels and ......

arXiv:1912.03458v2 [cs.CV] 31 Mar 2020

We inspect if DY-CNN is dynamic, using DY-. MobileNetV2 ×0.5, which has K = 4 kernels per layer and is trained by using...

Mixed Precision Training - arXiv Vanity

There have been a number of publications on training Convolutional Neural ... Table 3: Character Error Rate (CER) using mixed precision training for...

INTRODUCTION TO MIXED PRECISION TRAINING - NVlabs

Tensor Cores for mixed precision training and inference ... large sums of values, e.g. in linear layers and convolutions, can be too big...