Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

CPU resources, htop shows all 12 cores being used for example_1.py

See original GitHub issue

Thanks for the great library. I am running some tests to benchmark performance. Since I hope to be running these using many CPUs, I want to understand how many CPUs a script will consume. I installed the repository as of today, and ran:

python examples/example_1.py

I’m running this on a machine with a single GPU, and an i7-8700k CPU with 12 logical cores. I assume I’m not using the GPU in the above command.

In a separate tab, my htop is regularly showing something like this:

Screenshot from 2019-11-11 10-21-15

All 12 CPUs on my machine are being used.

When I run:

python examples/example_1.py --cuda_idx 0

I don’t generally see all 12 CPUs being used, but I may get something like 6 CPUs used:

Screenshot from 2019-11-11 10-25-52

Just wondering, the documentation of example_1.py says this:

Runs one instance of the Atari environment and optimizes using DQN algorithm.
Can use a GPU for the agent (applies to both sample and train). No parallelism
employed, so everything happens in one python process; can be easier to debug.

However I assumed “one python process = one core”. Perhaps this is not the right way to think about it. Is there a way to roughly estimate how many CPUs (or “cores” – I use terms interchangeably) will be used for a given training run?

Issue Analytics

State:
Created 4 years ago
Comments:11 (8 by maintainers)

Top GitHub Comments

3reactions

DanielTakeshicommented, Nov 19, 2019

Thanks @astooke

I did another test where I took example_5 and did one change, which is to replace MinibatchRlEval with MinibatchRl so that there’s no evaluation environment. I then ran it with 8 parallel envs as you can see below. htop is regularly showing at least 24 CPUs being used (on a 48 CPU machine) and nothing but my script is running on it.

Screenshot from 2019-11-18 19-22-46

Let me now do some testing with your suggestion of changing the num threads …

Update 1: was going to do some experimentation but saw @codelast put some new stuff below.

Update 2: actually I might have gotten confused. In example_5.py the batch_B is hard-coded at 32 so the n_parallel does not actually change it. I assumed n_parallel was the number of parallel environments, but I guess it’s more viewed as the amount of resources provided to a given program? I will continue investigating but do you have any quick suggestions / comments on the distinction between n_parallel and batch_B?

Update 3: figured out my prior question. batch_B is actually the number of parallel environments, and n_parallel is the number of workers that need to run those environments. If batch_B=32 and n_parallel=2 then we have [16,16] envs allocated to the two workers. If n_parallel=3 then it’s [11,11,10] envs allocated to the three workers, and we get a warning saying that performance may suffer due to unequal environment distribution.

1reaction

DanielTakeshicommented, Feb 20, 2020

I just ran some tests today, and indeed using taskset helped to limit my CPU usage, according to htop.