Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Question] RL Zoo uses dummy vec env by default?

See original GitHub issue

Question

I’m trying to understand how parallelism is implement is stable-baselines.

By default, as in train.py in RL Zoo, it seems like DummyVecEnv is used because

https://github.com/DLR-RM/rl-baselines3-zoo/blob/64d4625599d3244308f643810fd57d240aeeac58/train.py#L57
From SB3’s doc for PPO, to reproduce results in RL Zoo: python train.py --algo ppo --env $ENV_ID --eval-episodes 10 --eval-freq 10000 (which doesn’t change the default value for vec-env in the previous link)

However, by my understanding of the documentation, DummyVecEnv calls each env in sequence, and hence we shouldn’t expect any speedup from it.

Therefore, I have three quick questions:

Is there a reason why SubprocVecEnv is not used here? I mean, isn’t it faster?
If DummyVecEnv was used, how long did training take (approximately) for 1 million timesteps for, let’s say, halfcheetah-v2? Was the training duration reasonable?
Is VecNormalize used by default in both cases?

I have read through the documentation but couldn’t find answers on these. Thanks for the help in advance.

Checklist

I have read the documentation (required)
I have checked that there is no similar issue in the repo (required)

Issue Analytics

State:
Created 2 years ago
Comments:7 (2 by maintainers)

Top GitHub Comments

1reaction

araffincommented, Nov 3, 2021

What are some characteristics of an environment that affect the VecEnv choice?

it depends on where is the bottleneck. It is common that the reset() is the most costly operation (may take some seconds in robotic environments), in that case it is recommended to use sub-processes to do the reset in parallel.

the problem with sub-process is that it comes with a communication overhead (as shown in the colab linked in the documentation and by @Miffyli ) so for env that does require heavy computation like pendulum/cartpole, it does not make sense to use sub processes. However, you may observe a good speedup with atari games for instance (given that you have a good cpu).

Although, at some point, it also does not make sense to add more envs for two reasons (also discussed in the colab):

adding more envs favor exploration and make the whole training less sample efficient
the bottlebeck will comes from the gradient update (done sequentially and not asynchronously)

this highly depends on your environment (and on your hyperparameters).

regarding the hyperparameters, I’m thinking of n_epochs and train_freq / gradient_steps which control a compromise between data collection and gradient update. Again, if the time spent doing gradient update is much greater than the data collection, it does not really make sense to add more envs.

1reaction

Miffylicommented, Nov 2, 2021

Now I understand that overhead might be a problem for SubprocVenEnv and might make it worse than DummyVecEnv. But, for example, is DummyVecEnv of 10 envs faster than running 1 envs for 10 times longer? If so, why? I guess this confuses me the most.

One main contributor may be less calls to the agent predict function which is used to get actions during rollouts (we query actions for all environments with one predict call, regardless of the number of the environments). With more envs you better utilize the parallel nature of networks (bigger batches), where you have roughly the same runtime for predicting actions for a single environment or for eight. You will also end up with bigger batches of data before doing training, so there is less overhead between preparing rollout buffers and whatnot.

I checked the documentation for such arguments, but didn’t find any.

Ah sorry, I should have been more clear: I thought we had more documentation on this but it turns we do not 😃

Top Results From Across the Web

Question on VecNormalize · Issue #779 · hill-a/stable-baselines

Hi, I trained with TD3 using VecNormalize: env0 = DummyVecEnv([lambda: gym.make("merging-v0")]) env = VecNormalize(env0, norm_obs=True, ...

Vectorized Environments - Stable Baselines - Read the Docs

Vectorized Environments are a method for stacking multiple independent environments into a single environment. Instead of training an RL agent on 1 ...

Stable-Baselines3: Reliable Reinforcement Learning ...

RL Baselines Zoo provides scripts to train and evaluate agents, tune hyperparameters, record videos, store experiment setup and visualize ...

Using RL-Baselines3-Zoo at Hugging Face

rl -baselines3-zoo is a training framework for Reinforcement Learning using Stable Baselines3. Exploring RL-Baselines3-Zoo in the Hub. You can find RL-Baselines3 ...

Stable Baselines Documentation - Read the Docs

from stable_baselines.common.vec_env import DummyVecEnv ... Please use the hyperparameters in the RL zoo for best results.