Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[question] How exactly does the multiprocessing of PPO2 work?

See original GitHub issue

How exactly does the multiprocessing of PPO2 work and how does it affect serial timesteps and total timesteps?

In my case, n_steps = 2048. So every 2048 timesteps, PPO will update and the callback will be called, correct? Using 1 process, this happened roughly once every 6s.

But when I increased the number of processes, it got slower. However, the fps increased. And so did the total_timesteps. So if I used 2 processes instead of one, total_timesteps would increase twice as fast (so 4096 per update). Meanwhile, serial_timesteps would increase at the same rate (2048 per update) - which is now slower than usual, because the callback now takes 12s to be called.

I’m just super confused. I thought by increasing the amount of processes available the model would learn faster, but that doesn’t seem to be the case? How exactly does the amount of processes affect the algorithm speed?

Issue Analytics

State:
Created 4 years ago
Reactions:2
Comments:6

Top GitHub Comments

1reaction

araffincommented, May 14, 2019

Environments are run in parallel (if you use SubProcVecEnv) but no asynchronously (there is synchronization after each step from what I remember). In addition to that, you have some communication costs too (that’s what DummyVecEnv with n envs is sometimes faster than its subprocess counterpart), so doubling the number of processes does not scale linearly (see Amdahl’s law).

For those two reasons, augmenting the number of processes also increase the time between each update (even if the number of FPS increases).

Would this also be the case for PPO1 + MPI ?

what do you mean?

0reactions

H-Parkcommented, Jun 18, 2019

From your earlier answer, you said running envs in parallel doesn’t actually increase sample efficiency, so does that mean a model training on a single env and a model training on N envs would both require more or less the same amount of time to reach the same performance level?

I assume you’re asking about wall clock time:

To sum it up, with more processes, it goes faster (in term of FPS / wall clock time) and usually it improves exploration. However that usually don’t make the algorithm more sample efficient.

Top Results From Across the Web

PPO2 — Stable Baselines 2.10.3a0 documentation

PPO2 is the implementation of OpenAI made for GPU. For multiprocessing, it uses vectorized environments compared to PPO1 which uses MPI.

multiprocessing — Process-based parallelism — Python 3.11 ...

multiprocessing is a package that supports spawning processes using an API similar to the threading module. The multiprocessing package offers both local and ......

multiprocessing: Understanding logic behind `chunksize`

Short Answer. Pool's chunksize-algorithm is a heuristic. It provides a simple solution for all imaginable problem scenarios you are trying to stuff into ......

Which Reinforcement learning-RL algorithm to use where ...

TD3 and TRPO work well with continuous action spaces but lack the faster convergence rate; A3C is very useful when large computation power...

Multiprocessing in Python | Set 1 (Introduction) - GeeksforGeeks

If it is assigned several processes at the same time, it will have to interrupt each task and switch briefly to another, to...