RuntimeError: CUDA error: device kernel image is invalid
See original GitHub issueHey guys! I am testing on the pytorch model on deepchem. When I am trying on the tutorial: Creating Models With TensorFlow and PyTorch, there is always a error shows that: CUDA error: device kernel image is invalid. But I am sure that I have installed the correct Pytorch with CUDA support:
pytorch 1.6.0 py3.6_cuda10.1.243_cudnn7.6.3_0 pytorch
tensorflow-gpu 2.3.0 pypi_0 pypi
It works fine on Tensorflow(Keras), I wonder why PyTorch is not supported?
The code I used is:
import torch
import deepchem as dc
tasks, datasets, transformers = dc.molnet.load_delaney(featurizer='ECFP', splitter='random')
train_dataset, valid_dataset, test_dataset = datasets
pytorch_model = torch.nn.Sequential(
torch.nn.Linear(1024, 1000),
torch.nn.ReLU(),
torch.nn.Dropout(0.5),
torch.nn.Linear(1000, 1)
)
model = dc.models.TorchModel(pytorch_model, dc.models.losses.L2Loss())
model.fit(train_dataset, nb_epoch=50)
print('training set score:', model.evaluate(train_dataset, [metric]))
print('test set score:', model.evaluate(test_dataset, [metric]))
Here is the error:
$ python pytorch_test.py
2020-09-09 16:45:55.087719: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
Traceback (most recent call last):
File "pytorch_test.py", line 13, in <module>
model = dc.models.TorchModel(pytorch_model, dc.models.losses.L2Loss())
File "/home/rasin/miniconda3/envs/deepchem/lib/python3.6/site-packages/deepchem/models/torch_models/torch_model.py", line 182, in __init__
self.model = model.to(device)
File "/home/rasin/miniconda3/envs/deepchem/lib/python3.6/site-packages/torch/nn/modules/module.py", line 607, in to
return self._apply(convert)
File "/home/rasin/miniconda3/envs/deepchem/lib/python3.6/site-packages/torch/nn/modules/module.py", line 354, in _apply
module._apply(fn)
File "/home/rasin/miniconda3/envs/deepchem/lib/python3.6/site-packages/torch/nn/modules/module.py", line 376, in _apply
param_applied = fn(param)
File "/home/rasin/miniconda3/envs/deepchem/lib/python3.6/site-packages/torch/nn/modules/module.py", line 605, in convert
return t.to(device, dtype if t.is_floating_point() else None, non_blocking)
RuntimeError: CUDA error: device kernel image is invalid
Please help, thanks!
Issue Analytics
- State:
- Created 3 years ago
- Comments:6 (3 by maintainers)
Top Results From Across the Web
Tensorflow-gpu issue (CUDA runtime error: device kernel ...
I just had the same problem. I downgraded the Tensorflow2.3 version to 2.2 with following command. pip install --upgrade tensorflow==2.2.
Read more >Debugging "device kernel image is invalid ... - Google Groups
A few things you can try: 1) uninstall the CUDA driver and sdk, reinstall both of them and reboot. 2) Make sure you...
Read more >CUDA runtime implicit initialization on GPU:0 failed. Status ...
RuntimeError : CUDA runtime implicit initialization on GPU:0 failed. Status: device kernel image is invalid. 0. The environment I'm using is:.
Read more >NVIDIA CUDA Library: cudaError
This indicates that the device kernel image is invalid. This indicates that there is no kernel image available that is suitable for the...
Read more >RuntimeError: CUDA error: no kernel image is available for ...
When I run the following code, I got the RuntimeError. ... The code prints" device cuda:0" which means at least the code access...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Thank you very much for your reply! I would try a clean install instead.
After a clean installation, now both PyTorch and Tensorflow works. Here is the installation steps:
nvidia-smi
andnvcc
to see if the driver and cuda run correctlyThank you for your information! I seem that cuda 10.1 link is successful. However, generally, keeping multi cuda leads to an unexpected error because of complex dependencies. I recommend you should clean install. (To be honest, I don’t have much experiences about installation of cuda, so my advice may not be correct)
Please see https://www.tensorflow.org/api_docs/python/tf/config/get_visible_devices
In addition to this, DeepChem doesn’t support TensorFlow 2.3.0 officially. We support TensorFlow 2.2.0.