[question] How exactly does the multiprocessing of PPO2 work?
See original GitHub issueHow exactly does the multiprocessing of PPO2 work and how does it affect serial timesteps and total timesteps?
In my case, n_steps = 2048
. So every 2048 timesteps, PPO will update and the callback will be called, correct? Using 1 process, this happened roughly once every 6s.
But when I increased the number of processes, it got slower. However, the fps
increased. And so did the total_timesteps
. So if I used 2 processes instead of one, total_timesteps
would increase twice as fast (so 4096 per update). Meanwhile, serial_timesteps
would increase at the same rate (2048 per update) - which is now slower than usual, because the callback now takes 12s to be called.
I’m just super confused. I thought by increasing the amount of processes available the model would learn faster, but that doesn’t seem to be the case? How exactly does the amount of processes affect the algorithm speed?
Issue Analytics
- State:
- Created 4 years ago
- Reactions:2
- Comments:6
Top GitHub Comments
Environments are run in parallel (if you use
SubProcVecEnv
) but no asynchronously (there is synchronization after each step from what I remember). In addition to that, you have some communication costs too (that’s whatDummyVecEnv
with n envs is sometimes faster than its subprocess counterpart), so doubling the number of processes does not scale linearly (see Amdahl’s law).For those two reasons, augmenting the number of processes also increase the time between each update (even if the number of FPS increases).
what do you mean?
I assume you’re asking about wall clock time: