question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

`SubprocVecEnv` speedup does not scale linearly compared with `DummyVecEnv`

See original GitHub issue

I made some toy benchmark by creating 16 environments for both SubprocVecEnv and DummyVecEnv. And collect 1000 time steps by firstly reset the environment and feed random action sampled from action space within a for loop.

It turns out the speed of the simulator step is quite crucial for total speedup. For example, HalfCheetah-v2 is roughly 1.5-2x faster and ‘FetchPush-v1’ could be 7-9x faster. I guess it depends on the dynamics where cheetah is simpler.

For classic control environments like CartPole-v1, it seems using DummyVecEnv is much better, since the speedup is ~0.2x, i.e. 5x slower than DummyVecEnv.

I am considering that if it is feasible to scale up the speedup further to be approximately linear with the number of environments ? Or the main reason is coming from computing overhead in the Process ?

Issue Analytics

  • State:open
  • Created 5 years ago
  • Reactions:1
  • Comments:14 (10 by maintainers)

github_iconTop GitHub Comments

2reactions
zuoxingdongcommented, Sep 27, 2018

There is an additional benchmark on some Mujoco environments (tested on DGX-1)

bench3 bench4 bench5 bench6 bench7 bench8 bench9 bench10 bench11 bench12

2reactions
pzhokhovcommented, Sep 25, 2018

chunks of sub-environments per process instead of one per process is a great idea! I’d be very interested to see the results of that (how much faster does venv.step() method become with different types of sub-environments. asyncio - also a good one, although I’d rather keep things compatible with python 3.6 and below. Anyways, if you feel like implementing any of these - do not let me stop you from submitting a PR 😃 To the MPI vs multiprocessing question - the VecEnv configuration (master process updating neural net, subprocesses running env.step) is especially beneficial for conv nets and atari-like envs, because then the updates from relatively large batches can be computed on a GPU fast (much faster than if every process were to run gradient computation on its own - several processes actively interacting with GPU is usually not a great setup). In principle, the same communication pattern can be done with MPI, but it is a little more involved, and requires MPI installed. On the other hand, in mujoco-like environments (when using non-pixel observations - positions and velocities of joints) neural nets are relatively small, so batching data to compute the update does not give much of a speed-up; on the other hand, with MPI you can actually run the experiment on a distributed machine - that’s why, for instance, HER uses MPI over SubprocVecEnv. For TRPO and PPO1 the choice could have been done either way; in fact, PPO2 can use both. I don’t know relative latency of MPI communication over pipes, I suspect that those should be similar; but have never measured it / seen the measurements.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Vectorized Environments - Stable Baselines - Read the Docs
DummyVecEnv. Creates a simple vectorized wrapper for multiple environments, calling each environment in sequence on the current Python process. This is useful ...
Read more >
Stable Baselines3 Tutorial - Multiprocessing of environments
In practice, DummyVecEnv is usually faster than SubprocVecEnv because of communication ... so we do not see a linear scaling of the FPS...
Read more >
Stable Baselines Documentation - Read the Docs
sorflow in the past (see Issue #430) and so if you do not intend to use these ... from stable_baselines.common.vec_env import DummyVecEnv.
Read more >
EnvPool, a highly parallel reinforcement learning environment ...
Sample efficiency is not sacrificed when replacing OpenAI gym with EnvPool and keeping the same experiment configuration. It is a pure speedup ......
Read more >
A Usage of EnvPool - OpenReview
For maximizing the throughput of the environment execution, users may use the ... compared to the Python counterpart and achieves considerable speedup.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found