using gpu cause cudnn error
See original GitHub issue🐛 Bug
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [100, 113, 10]], which is output 0 of CudnnRnnBackward0, is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
To Reproduce
I can run my code while using cpu, but swich to cuda cause error like this
Expected behavior
Environment
- PyTorch Lightning Version (e.g., 1.5.0): 1.6.0
- PyTorch Version (e.g., 1.10): 1.11.0
- Python version (e.g., 3.9): 3.8
- OS (e.g., Linux): Linux centos
- CUDA/cuDNN version: 11.3
- GPU models and configuration: GPU 2080 ti
- How you installed PyTorch (
conda
,pip
, source): - If compiling from source, the output of
torch.__config__.show()
: - Any other relevant information:
Additional context
cc @justusschock @kaushikb11 @awaelchli @akihironitta @rohitgr7
Issue Analytics
- State:
- Created a year ago
- Comments:7 (3 by maintainers)
Top Results From Across the Web
F.conv2d() causes RuntimeError: cuDNN error: ...
It looks like there is a bug in CUDNN v8 for Titan X class (and maybe other classes of GPU). The version of...
Read more >Cudnn Error in initializeCommonContext - TensorRT
Description. Hi, I met a problem when I tried to deserialize a TensorRT engine and create the context. The system threw an Error...
Read more >RuntimeError: cuDNN error
I am running this code in a computer with rtx 3090ti github_code. However, the code raises an error with first forward layer.
Read more >RuntimeError: cuDNN error
If it is not that your model/data is too big then it is because your GPU has not freed the memory. Go to...
Read more >Memory Management and Using Multiple GPUs
If you just call cuda , then the tensor is placed on GPU 0. The torch.nn. ... If operands are on different devices,...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
thank you, set relu layer param inplace = False save my model~ hh
need to investigate the actual issue. It’s related to PyTorch I guess, but still curious.