[Question] RL Zoo uses dummy vec env by default?
See original GitHub issueQuestion
I’m trying to understand how parallelism is implement is stable-baselines.
By default, as in train.py
in RL Zoo, it seems like DummyVecEnv
is used because
- https://github.com/DLR-RM/rl-baselines3-zoo/blob/64d4625599d3244308f643810fd57d240aeeac58/train.py#L57
- From SB3’s doc for PPO, to reproduce results in RL Zoo:
python train.py --algo ppo --env $ENV_ID --eval-episodes 10 --eval-freq 10000
(which doesn’t change the default value forvec-env
in the previous link)
However, by my understanding of the documentation, DummyVecEnv
calls each env in sequence, and hence we shouldn’t expect any speedup from it.
Therefore, I have three quick questions:
- Is there a reason why
SubprocVecEnv
is not used here? I mean, isn’t it faster? - If
DummyVecEnv
was used, how long did training take (approximately) for 1 million timesteps for, let’s say,halfcheetah-v2
? Was the training duration reasonable? - Is
VecNormalize
used by default in both cases?
I have read through the documentation but couldn’t find answers on these. Thanks for the help in advance.
Checklist
- I have read the documentation (required)
- I have checked that there is no similar issue in the repo (required)
Issue Analytics
- State:
- Created 2 years ago
- Comments:7 (2 by maintainers)
Top Results From Across the Web
Question on VecNormalize · Issue #779 · hill-a/stable-baselines
Hi, I trained with TD3 using VecNormalize: env0 = DummyVecEnv([lambda: gym.make("merging-v0")]) env = VecNormalize(env0, norm_obs=True, ...
Read more >Vectorized Environments - Stable Baselines - Read the Docs
Vectorized Environments are a method for stacking multiple independent environments into a single environment. Instead of training an RL agent on 1 ...
Read more >Stable-Baselines3: Reliable Reinforcement Learning ...
RL Baselines Zoo provides scripts to train and evaluate agents, tune hyperparameters, record videos, store experiment setup and visualize ...
Read more >Using RL-Baselines3-Zoo at Hugging Face
rl -baselines3-zoo is a training framework for Reinforcement Learning using Stable Baselines3. Exploring RL-Baselines3-Zoo in the Hub. You can find RL-Baselines3 ...
Read more >Stable Baselines Documentation - Read the Docs
from stable_baselines.common.vec_env import DummyVecEnv ... Please use the hyperparameters in the RL zoo for best results.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
it depends on where is the bottleneck. It is common that the
reset()
is the most costly operation (may take some seconds in robotic environments), in that case it is recommended to use sub-processes to do the reset in parallel.the problem with sub-process is that it comes with a communication overhead (as shown in the colab linked in the documentation and by @Miffyli ) so for env that does require heavy computation like pendulum/cartpole, it does not make sense to use sub processes. However, you may observe a good speedup with atari games for instance (given that you have a good cpu).
Although, at some point, it also does not make sense to add more envs for two reasons (also discussed in the colab):
regarding the hyperparameters, I’m thinking of
n_epochs
andtrain_freq / gradient_steps
which control a compromise between data collection and gradient update. Again, if the time spent doing gradient update is much greater than the data collection, it does not really make sense to add more envs.One main contributor may be less calls to the agent
predict
function which is used to get actions during rollouts (we query actions for all environments with onepredict
call, regardless of the number of the environments). With more envs you better utilize the parallel nature of networks (bigger batches), where you have roughly the same runtime for predicting actions for a single environment or for eight. You will also end up with bigger batches of data before doing training, so there is less overhead between preparing rollout buffers and whatnot.Ah sorry, I should have been more clear: I thought we had more documentation on this but it turns we do not 😃