Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Bug] excessive CPU utilization

See original GitHub issue

🐛 Bug

Excessive CPU utilization with CnnPolicy.

To Reproduce

On a GCP deep learning instance, the following code snippet uses 8 full CPUs during training. Note that on our private servers, this does not occur. Not clear whether this is due to cloud virtualization or to the more recent version of cuda (11.1) on GCP.

from stable_baselines3 import PPO
from stable_baselines3.common.atari_wrappers import AtariWrapper
import gym
import supersuit as ss

env = gym.make("SpaceInvadersNoFrameskip-v4")
env = AtariWrapper(env)

model = PPO("CnnPolicy", env)
model.learn(total_timesteps=2000000)
model.save("policy")

setting torch.set_num_threads(1) or OMP_NUM_THREADS=1 or MKL_NUM_THREADS=1 all successfully run the code in a single thread.

Expected behavior

We expect this to be single threaded or to be using a small number of threads when CUDA is enabled. The fact that it doesn’t is a bit disturbing, as it suggests that perhaps some operation is running on CPU instead of GPU.

### System Info

Describe the characteristic of your environment:

installed version 1.0 with pip
cuda 11.0
Cudnn 8.0.5
pytorch 1.7.1
python 3.7

Only seems to happen on a GCP instance

Checklist

I have checked that there is no similar issue in the repo (required)
I have read the documentation (required)
I have provided a minimal working example to reproduce the bug (required)

Issue Analytics

State:
Created 2 years ago
Comments:14 (3 by maintainers)

Top GitHub Comments

1reaction

divyanshj16commented, Jul 27, 2021

@Miffyli Thanks for the pointers. It was indeed an issue with mujoco,it was not using gpu for rendering. After fixing that its much faster, but still my local machine has faster rendering. I am resolving that now.

1reaction

Miffylicommented, Jul 24, 2021

Judging by the memory use and power state (P0), the GPU indeed is being used, but the input sizes are so small it does not get to utilize V100 too much. Does it still run faster than without CUDA on AWS? If so, everything checks out. To make better use of the GPU, you could increase the number of environments, the batch size and n_steps.

Edit: I just noticed the “running fast locally”. Hmm… This is peculiar indeed. Judging by the prints you have share the GPU is indeed being used (although “GPU-Util” is 0% for some reason), and the code should not behave differently like this on different machines. My only remaining suggestion is that the environment runs slower on the AWS machine, which leads to slower training overall. You could check the speed of environments without training to check if this is the problem.