question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Bug] excessive CPU utilization

See original GitHub issue

🐛 Bug

Excessive CPU utilization with CnnPolicy.

To Reproduce

On a GCP deep learning instance, the following code snippet uses 8 full CPUs during training. Note that on our private servers, this does not occur. Not clear whether this is due to cloud virtualization or to the more recent version of cuda (11.1) on GCP.

from stable_baselines3 import PPO
from stable_baselines3.common.atari_wrappers import AtariWrapper
import gym
import supersuit as ss

env = gym.make("SpaceInvadersNoFrameskip-v4")
env = AtariWrapper(env)

model = PPO("CnnPolicy", env)
model.learn(total_timesteps=2000000)
model.save("policy")

setting torch.set_num_threads(1) or OMP_NUM_THREADS=1 or MKL_NUM_THREADS=1 all successfully run the code in a single thread.

Expected behavior

We expect this to be single threaded or to be using a small number of threads when CUDA is enabled. The fact that it doesn’t is a bit disturbing, as it suggests that perhaps some operation is running on CPU instead of GPU.

### System Info

Describe the characteristic of your environment:

  • installed version 1.0 with pip
  • cuda 11.0
  • Cudnn 8.0.5
  • pytorch 1.7.1
  • python 3.7

Only seems to happen on a GCP instance

Checklist

  • I have checked that there is no similar issue in the repo (required)
  • I have read the documentation (required)
  • I have provided a minimal working example to reproduce the bug (required)

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:14 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
divyanshj16commented, Jul 27, 2021

@Miffyli Thanks for the pointers. It was indeed an issue with mujoco,it was not using gpu for rendering. After fixing that its much faster, but still my local machine has faster rendering. I am resolving that now.

1reaction
Miffylicommented, Jul 24, 2021

Judging by the memory use and power state (P0), the GPU indeed is being used, but the input sizes are so small it does not get to utilize V100 too much. Does it still run faster than without CUDA on AWS? If so, everything checks out. To make better use of the GPU, you could increase the number of environments, the batch size and n_steps.

Edit: I just noticed the “running fast locally”. Hmm… This is peculiar indeed. Judging by the prints you have share the GPU is indeed being used (although “GPU-Util” is 0% for some reason), and the code should not behave differently like this on different machines. My only remaining suggestion is that the environment runs slower on the AWS machine, which leads to slower training overall. You could check the speed of environments without training to check if this is the problem.

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to Fix High CPU Usage - Intel
Find out all the reasons why your PC displays high CPU usage. Our step-by-step guide will show you how to fix your CPU...
Read more >
How to Fix High CPU Usage (with Pictures) - wikiHow
1. Press .Ctrl+ Shift+Esc to open the Task Manager. This is a utility that monitors and reports on all of the processes and...
Read more >
How to Lower CPU Usage: Common Causes & Tips - N-able
The symptoms of high CPU usage are familiar: the cursor moves jerkily and slowly, and applications begin to lag or shut down.
Read more >
[Bug] High CPU usage on main process
The parent process takes up CPU resources heavily after an indefinite amount of time, and CPU usage remains elevated while being idle. In....
Read more >
How to Fix Your Mac's "kernel_task" High CPU Usage Bug
The reason for your slow computer should be evident from the CPU tab. Just click the % CPU column header to organize running...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found