question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

RuntimeError: CUDA error: device kernel image is invalid

See original GitHub issue

Hey guys! I am testing on the pytorch model on deepchem. When I am trying on the tutorial: Creating Models With TensorFlow and PyTorch, there is always a error shows that: CUDA error: device kernel image is invalid. But I am sure that I have installed the correct Pytorch with CUDA support:

pytorch                   1.6.0           py3.6_cuda10.1.243_cudnn7.6.3_0    pytorch
tensorflow-gpu            2.3.0                    pypi_0    pypi

It works fine on Tensorflow(Keras), I wonder why PyTorch is not supported?

The code I used is:

import torch
import deepchem as dc

tasks, datasets, transformers = dc.molnet.load_delaney(featurizer='ECFP', splitter='random')
train_dataset, valid_dataset, test_dataset = datasets

pytorch_model = torch.nn.Sequential(
    torch.nn.Linear(1024, 1000),
    torch.nn.ReLU(),
    torch.nn.Dropout(0.5),
    torch.nn.Linear(1000, 1)
)
model = dc.models.TorchModel(pytorch_model, dc.models.losses.L2Loss())

model.fit(train_dataset, nb_epoch=50)
print('training set score:', model.evaluate(train_dataset, [metric]))
print('test set score:', model.evaluate(test_dataset, [metric]))

Here is the error:

$ python pytorch_test.py            
2020-09-09 16:45:55.087719: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
Traceback (most recent call last):
  File "pytorch_test.py", line 13, in <module>
    model = dc.models.TorchModel(pytorch_model, dc.models.losses.L2Loss())
  File "/home/rasin/miniconda3/envs/deepchem/lib/python3.6/site-packages/deepchem/models/torch_models/torch_model.py", line 182, in __init__
    self.model = model.to(device)
  File "/home/rasin/miniconda3/envs/deepchem/lib/python3.6/site-packages/torch/nn/modules/module.py", line 607, in to
    return self._apply(convert)
  File "/home/rasin/miniconda3/envs/deepchem/lib/python3.6/site-packages/torch/nn/modules/module.py", line 354, in _apply
    module._apply(fn)
  File "/home/rasin/miniconda3/envs/deepchem/lib/python3.6/site-packages/torch/nn/modules/module.py", line 376, in _apply
    param_applied = fn(param)
  File "/home/rasin/miniconda3/envs/deepchem/lib/python3.6/site-packages/torch/nn/modules/module.py", line 605, in convert
    return t.to(device, dtype if t.is_floating_point() else None, non_blocking)
RuntimeError: CUDA error: device kernel image is invalid

Please help, thanks!

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:6 (3 by maintainers)

github_iconTop GitHub Comments

3reactions
rasin-tsukubacommented, Sep 11, 2020

Thank you very much for your reply! I would try a clean install instead.


After a clean installation, now both PyTorch and Tensorflow works. Here is the installation steps:

  1. uninstall all cuda and nvidia-driver
  2. install cuda 10.1 with nvidia-driver
  3. test nvidia-smi and nvcc to see if the driver and cuda run correctly
$ nvidia-smi
Fri Sep 11 11:54:38 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.152.00   Driver Version: 418.152.00   CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243
  1. install cudnn as usual
  2. install packages
conda create -n deepchem python=3.6
pip install tensorflow-gpu==2.2
conda install -c conda-forge rdkit
pip install --pre deepchem
pip install torch==1.6.0+cu101 torchvision==0.7.0+cu101 -f https://download.pytorch.org/whl/torch_stable.html
  1. Everything is ok! Enjoy the brand new deepchem!
0reactions
nissy-devcommented, Sep 11, 2020

To be honest, I have first installed cuda 10.2 with nvidia-driver and I found that I should install cuda 10.1 for the tensorflow-gpu support. So I keep both cuda 101 and cuda 102 on my machine. But I believe I have link the current /usr/local/cuda to cuda10.1 dir. Here are some details:

Thank you for your information! I seem that cuda 10.1 link is successful. However, generally, keeping multi cuda leads to an unexpected error because of complex dependencies. I recommend you should clean install. (To be honest, I don’t have much experiences about installation of cuda, so my advice may not be correct)

Another thing is that I have installed tensorflow-gpu but my keras model wont run on GPU, is there any way to force it to do so?

Please see https://www.tensorflow.org/api_docs/python/tf/config/get_visible_devices

In addition to this, DeepChem doesn’t support TensorFlow 2.3.0 officially. We support TensorFlow 2.2.0.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Tensorflow-gpu issue (CUDA runtime error: device kernel ...
I just had the same problem. I downgraded the Tensorflow2.3 version to 2.2 with following command. pip install --upgrade tensorflow==2.2.
Read more >
Debugging "device kernel image is invalid ... - Google Groups
A few things you can try: 1) uninstall the CUDA driver and sdk, reinstall both of them and reboot. 2) Make sure you...
Read more >
CUDA runtime implicit initialization on GPU:0 failed. Status ...
RuntimeError : CUDA runtime implicit initialization on GPU:0 failed. Status: device kernel image is invalid. 0. The environment I'm using is:.
Read more >
NVIDIA CUDA Library: cudaError
This indicates that the device kernel image is invalid. This indicates that there is no kernel image available that is suitable for the...
Read more >
RuntimeError: CUDA error: no kernel image is available for ...
When I run the following code, I got the RuntimeError. ... The code prints" device cuda:0" which means at least the code access...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found