Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

RuntimeError: CUDA error: device kernel image is invalid

See original GitHub issue

Hey guys! I am testing on the pytorch model on deepchem. When I am trying on the tutorial: Creating Models With TensorFlow and PyTorch, there is always a error shows that: CUDA error: device kernel image is invalid. But I am sure that I have installed the correct Pytorch with CUDA support:

pytorch                   1.6.0           py3.6_cuda10.1.243_cudnn7.6.3_0    pytorch
tensorflow-gpu            2.3.0                    pypi_0    pypi

It works fine on Tensorflow(Keras), I wonder why PyTorch is not supported?

The code I used is:

import torch
import deepchem as dc

tasks, datasets, transformers = dc.molnet.load_delaney(featurizer='ECFP', splitter='random')
train_dataset, valid_dataset, test_dataset = datasets

pytorch_model = torch.nn.Sequential(
    torch.nn.Linear(1024, 1000),
    torch.nn.ReLU(),
    torch.nn.Dropout(0.5),
    torch.nn.Linear(1000, 1)
)
model = dc.models.TorchModel(pytorch_model, dc.models.losses.L2Loss())

model.fit(train_dataset, nb_epoch=50)
print('training set score:', model.evaluate(train_dataset, [metric]))
print('test set score:', model.evaluate(test_dataset, [metric]))

Here is the error:

$ python pytorch_test.py            
2020-09-09 16:45:55.087719: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
Traceback (most recent call last):
  File "pytorch_test.py", line 13, in <module>
    model = dc.models.TorchModel(pytorch_model, dc.models.losses.L2Loss())
  File "/home/rasin/miniconda3/envs/deepchem/lib/python3.6/site-packages/deepchem/models/torch_models/torch_model.py", line 182, in __init__
    self.model = model.to(device)
  File "/home/rasin/miniconda3/envs/deepchem/lib/python3.6/site-packages/torch/nn/modules/module.py", line 607, in to
    return self._apply(convert)
  File "/home/rasin/miniconda3/envs/deepchem/lib/python3.6/site-packages/torch/nn/modules/module.py", line 354, in _apply
    module._apply(fn)
  File "/home/rasin/miniconda3/envs/deepchem/lib/python3.6/site-packages/torch/nn/modules/module.py", line 376, in _apply
    param_applied = fn(param)
  File "/home/rasin/miniconda3/envs/deepchem/lib/python3.6/site-packages/torch/nn/modules/module.py", line 605, in convert
    return t.to(device, dtype if t.is_floating_point() else None, non_blocking)
RuntimeError: CUDA error: device kernel image is invalid

Please help, thanks!

Issue Analytics

State:
Created 3 years ago
Comments:6 (3 by maintainers)

Top GitHub Comments

3reactions

rasin-tsukubacommented, Sep 11, 2020

Thank you very much for your reply! I would try a clean install instead.

After a clean installation, now both PyTorch and Tensorflow works. Here is the installation steps:

uninstall all cuda and nvidia-driver
install cuda 10.1 with nvidia-driver
test nvidia-smi and nvcc to see if the driver and cuda run correctly

$ nvidia-smi
Fri Sep 11 11:54:38 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.152.00   Driver Version: 418.152.00   CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243

install cudnn as usual
install packages

conda create -n deepchem python=3.6
pip install tensorflow-gpu==2.2
conda install -c conda-forge rdkit
pip install --pre deepchem
pip install torch==1.6.0+cu101 torchvision==0.7.0+cu101 -f https://download.pytorch.org/whl/torch_stable.html

Everything is ok! Enjoy the brand new deepchem!

0reactions

nissy-devcommented, Sep 11, 2020

To be honest, I have first installed cuda 10.2 with nvidia-driver and I found that I should install cuda 10.1 for the tensorflow-gpu support. So I keep both cuda 101 and cuda 102 on my machine. But I believe I have link the current /usr/local/cuda to cuda10.1 dir. Here are some details:

Thank you for your information! I seem that cuda 10.1 link is successful. However, generally, keeping multi cuda leads to an unexpected error because of complex dependencies. I recommend you should clean install. (To be honest, I don’t have much experiences about installation of cuda, so my advice may not be correct)

Another thing is that I have installed tensorflow-gpu but my keras model wont run on GPU, is there any way to force it to do so?

Please see https://www.tensorflow.org/api_docs/python/tf/config/get_visible_devices

In addition to this, DeepChem doesn’t support TensorFlow 2.3.0 officially. We support TensorFlow 2.2.0.

Top Results From Across the Web

Tensorflow-gpu issue (CUDA runtime error: device kernel ...

I just had the same problem. I downgraded the Tensorflow2.3 version to 2.2 with following command. pip install --upgrade tensorflow==2.2.

Debugging "device kernel image is invalid ... - Google Groups

A few things you can try: 1) uninstall the CUDA driver and sdk, reinstall both of them and reboot. 2) Make sure you...

CUDA runtime implicit initialization on GPU:0 failed. Status ...

RuntimeError : CUDA runtime implicit initialization on GPU:0 failed. Status: device kernel image is invalid. 0. The environment I'm using is:.

NVIDIA CUDA Library: cudaError

This indicates that the device kernel image is invalid. This indicates that there is no kernel image available that is suitable for the...

RuntimeError: CUDA error: no kernel image is available for ...

When I run the following code, I got the RuntimeError. ... The code prints" device cuda:0" which means at least the code access...

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Start Free

Top Related Reddit Thread

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

RuntimeError: CUDA error: device kernel image is invalid

Issue Analytics

Top GitHub Comments

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

Unify API for loader functions

Add support for multiple sequence alignment, homology modeling, and deep structural prediction