Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

using gpu cause cudnn error

See original GitHub issue

🐛 Bug

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [100, 113, 10]], which is output 0 of CudnnRnnBackward0, is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

To Reproduce

I can run my code while using cpu, but swich to cuda cause error like this

Expected behavior

Environment

PyTorch Lightning Version (e.g., 1.5.0): 1.6.0
PyTorch Version (e.g., 1.10): 1.11.0
Python version (e.g., 3.9): 3.8
OS (e.g., Linux): Linux centos
CUDA/cuDNN version: 11.3
GPU models and configuration: GPU 2080 ti
How you installed PyTorch (conda, pip, source):
If compiling from source, the output of torch.__config__.show():
Any other relevant information:

Additional context

cc @justusschock @kaushikb11 @awaelchli @akihironitta @rohitgr7

Issue Analytics

State:
Created a year ago
Comments:7 (3 by maintainers)

Top GitHub Comments

1reaction

kimmy966commented, Apr 7, 2022

is it possible that in your logic, a layer is modified already by the optimizer step, but there is another call of loss.backward that relies on old weights to compute the gradients?

or try this if you have ReLU layers NVlabs/FUNIT#23 (comment)

thank you, set relu layer param inplace = False save my model~ hh

0reactions

rohitgr7commented, Apr 7, 2022