Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Bug] CUDA error when running sample code

See original GitHub issue

I am running the basic code to train a PPO agent on CartPole and it gives me an CUDA error:

RuntimeError: CUDA error: CUBLAS_STATUS_INTERNAL_ERROR when calling cublasCreate(handle)

in “/.local/lib/python3.6/site-packages/torch/nn/functional.py”, line 1753, in linear return torch._C._nn.linear(input, weight, bias)

Python version 3.6.9

Pytorch version 1.8.0

Code is this:

import gym

from stable_baselines3 import PPO

env = gym.make('CartPole-v1')

model = PPO('MlpPolicy', env, verbose=1)
model.learn(total_timesteps=10000)

obs = env.reset()
for i in range(1000):
    action, _states = model.predict(obs, deterministic=True)
    obs, reward, done, info = env.step(action)
    env.render()
    if done:
      obs = env.reset()

env.close()

Issue Analytics

State:
Created 3 years ago
Comments:9 (4 by maintainers)

Top GitHub Comments

1reaction

ac-93commented, Mar 10, 2021

Same issue for me, seems to be torch 1.8 related (https://github.com/pytorch/pytorch/issues/53336)

Downgrading to torch 1.7.1 seems to work fine for now.

1reaction

youryzcommented, Mar 8, 2021

Actually no @Miffyli , the code araffin provides works. My own deep learning projects based on Pytorch also works (MLP training & inference). What I mean sample code is the code in my initial post, which is used to train a PPO agent, and I post it again here:
import gym

from stable_baselines3 import PPO

env = gym.make('CartPole-v1')

model = PPO('MlpPolicy', env, verbose=1)
model.learn(total_timesteps=10000)

obs = env.reset()
for i in range(1000):
    action, _states = model.predict(obs, deterministic=True)
    obs, reward, done, info = env.step(action)
    env.render()
    if done:
      obs = env.reset()

env.close()
I’ve checked my Pytorch/CUDA and there’s no problem with them. I will check again in case. But I don’t think there should be any problem with them since my past projects worked well.
the error that you have is definitely a pytorch/cuda error. I have tested SB3 with python 3.6 and pytorch 1.8 on my machine and on a google colab instance and did not get any error… i guess the code will also run if you use the cpu only.

I switched to cpu by passing ‘cpu’ to PPO, it trains two epochs, then it breaks again and gives me the same CUDA error.

I have two different machines and the code works on neither one.

Anyway I will check my CUDA installation.

Thanks

Top Results From Across the Web

CUDA error with code=700(cudaErrorIllegalAddress)

CUDA Runtime Problem: CUDA error with code=700(cudaErrorIllegalAddress) I recently bought RTX 3090 Ti for my new desktop and I installed nvidia ...

RuntimeError: CUDA error: out of memory. Can't run ... - GitHub

Describe the bug "CUDA error: out of memory" was reported when training. ... Can't run the ASR_CTC_Language_Finetuning Tutorial while memory ...

Bug with Julia 1.7.1 and CUDA 3.3 - GPU

I have a related question. I need to run this code on several clusters, and I need to ensure reproducibility. Is there a...

"RuntimeError: CUDA error: out of memory" - Stack Overflow

The error occurs because you ran out of memory on your GPU. One way to solve it is to reduce the batch size...

runtimeerror: cuda error: cublas_status_internal_error when ...

Bug. When I run your code I get the following error: RuntimeError: CUDA error: CUBLAS_STATUS_INTERNAL_ERROR when calling `cublasCreate(handle)`.

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Start Free

Top Related Reddit Thread

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

[Bug] CUDA error when running sample code

Issue Analytics

Top GitHub Comments

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

Canonical Channels First/Last Clarification

[Feature Request] Early stop the training if there is no improvement (no new best model) after consecutive evaluations