[Question] Huge performance difference with different n_envs
See original GitHub issueQuestion
I’m running a2c with default parameters on BreakoutNoFrameskip-v4, with two different training scenario, where the only difference is that one uses n_envs=16
(orange) while the other one set n_envs=40
(blue). However the performance difference is huge. Is there a particular reason behind this behavior? I thought n_envs
is more of a parallelization parameter, which shouldn’t have such a huge impact on performance.
Additional context
config.yml:
!!python/object/apply:collections.OrderedDict
- - - ent_coef
- 0.01
- - env_wrapper
- - stable_baselines3.common.atari_wrappers.AtariWrapper
- - frame_stack
- 4
- - n_envs
- 40 (or 16)
- - n_timesteps
- 10000000.0
- - policy
- CnnPolicy
- - policy_kwargs
- dict(optimizer_class=RMSpropTFLike, optimizer_kwargs=dict(eps=1e-5))
- - vf_coef
- 0.25
args.yml:
!!python/object/apply:collections.OrderedDict
- - - algo
- a2c
- - env
- BreakoutNoFrameskip-v4
- - env_kwargs
- null
- - eval_episodes
- 5
- - eval_freq
- 10000
- - gym_packages
- []
- - hyperparams
- null
- - log_folder
- logs
- - log_interval
- -1
- - n_eval_envs
- 1
- - n_evaluations
- 20
- - n_jobs
- 1
- - n_startup_trials
- 10
- - n_timesteps
- -1
- - n_trials
- 10
- - no_optim_plots
- false
- - num_threads
- -1
- - optimization_log_path
- null
- - optimize_hyperparameters
- false
- - pruner
- median
- - sampler
- tpe
- - save_freq
- -1
- - save_replay_buffer
- false
- - seed
- 0
- - storage
- null
- - study_name
- null
- - tensorboard_log
- ''
- - trained_agent
- ''
- - truncate_last_trajectory
- true
- - uuid
- false
- - vec_env
- dummy
- - verbose
- 1
Issue Analytics
- State:
- Created 2 years ago
- Comments:8 (1 by maintainers)
Top Results From Across the Web
What is the difference between NaN and None? - Stack Overflow
NaN is used as a placeholder for missing data consistently in pandas, consistency is good. I usually read/translate NaN as "missing".
Read more >What are the reasons for great performance differences ...
Performance measurement is a surprisingly hard topic and huge differences such as ... Using two different machines to compare a program is out...
Read more >Performance of PPO on Pong #185 - openai/baselines - GitHub
Hello, I can run_atari.py provided in ppo1 without experiencing any problem. However, the mean episodic reward remains at a level of 20+ ...
Read more >Observational Overfitting in Reinforcement Learning - Behnam ...
We discuss realistic instances where observational overfitting may occur and its difference from other confounding factors, and design a parametric ...
Read more >EnvPool, a highly parallel reinforcement learning environment ...
the RL environment simulation speed across different hardware ... of large-scale distributed systems and advanced AI chips like TPUs [16].
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
As far as I know, you are indeed reporting a real bug. Varying the number of environments should have a limited impact on the result. For what it’s worth, I reproduced the same curve as you, which seems to confirm that we are not within the error bands.
Have you updated ˋgardient_steps` properly?
See related issue: https://github.com/DLR-RM/stable-baselines3/issues/699
Ans the documentation: https://stable-baselines3.readthedocs.io/en/master/guide/examples.html#multiprocessing-with-off-policy-algorithms