CPU resources, htop shows all 12 cores being used for example_1.py
See original GitHub issueThanks for the great library. I am running some tests to benchmark performance. Since I hope to be running these using many CPUs, I want to understand how many CPUs a script will consume. I installed the repository as of today, and ran:
python examples/example_1.py
I’m running this on a machine with a single GPU, and an i7-8700k CPU with 12 logical cores. I assume I’m not using the GPU in the above command.
In a separate tab, my htop
is regularly showing something like this:
All 12 CPUs on my machine are being used.
When I run:
python examples/example_1.py --cuda_idx 0
I don’t generally see all 12 CPUs being used, but I may get something like 6 CPUs used:
Just wondering, the documentation of example_1.py
says this:
Runs one instance of the Atari environment and optimizes using DQN algorithm.
Can use a GPU for the agent (applies to both sample and train). No parallelism
employed, so everything happens in one python process; can be easier to debug.
However I assumed “one python process = one core”. Perhaps this is not the right way to think about it. Is there a way to roughly estimate how many CPUs (or “cores” – I use terms interchangeably) will be used for a given training run?
Issue Analytics
- State:
- Created 4 years ago
- Comments:11 (8 by maintainers)
Top GitHub Comments
Thanks @astooke
I did another test where I took example_5 and did one change, which is to replace
MinibatchRlEval
withMinibatchRl
so that there’s no evaluation environment. I then ran it with 8 parallel envs as you can see below. htop is regularly showing at least 24 CPUs being used (on a 48 CPU machine) and nothing but my script is running on it.Let me now do some testing with your suggestion of changing the num threads …
Update 1: was going to do some experimentation but saw @codelast put some new stuff below.
Update 2: actually I might have gotten confused. In
example_5.py
thebatch_B
is hard-coded at 32 so then_parallel
does not actually change it. I assumedn_parallel
was the number of parallel environments, but I guess it’s more viewed as the amount of resources provided to a given program? I will continue investigating but do you have any quick suggestions / comments on the distinction betweenn_parallel
andbatch_B
?Update 3: figured out my prior question.
batch_B
is actually the number of parallel environments, andn_parallel
is the number of workers that need to run those environments. Ifbatch_B=32
andn_parallel=2
then we have [16,16] envs allocated to the two workers. Ifn_parallel=3
then it’s [11,11,10] envs allocated to the three workers, and we get a warning saying that performance may suffer due to unequal environment distribution.I just ran some tests today, and indeed using
taskset
helped to limit my CPU usage, according tohtop
.