Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Question] Huge performance difference with different n_envs

See original GitHub issue

Question

I’m running a2c with default parameters on BreakoutNoFrameskip-v4, with two different training scenario, where the only difference is that one uses n_envs=16 (orange) while the other one set n_envs=40 (blue). However the performance difference is huge. Is there a particular reason behind this behavior? I thought n_envs is more of a parallelization parameter, which shouldn’t have such a huge impact on performance.

Additional context

config.yml:

!!python/object/apply:collections.OrderedDict
- - - ent_coef
    - 0.01
  - - env_wrapper
    - - stable_baselines3.common.atari_wrappers.AtariWrapper
  - - frame_stack
    - 4
  - - n_envs
    - 40 (or 16)
  - - n_timesteps
    - 10000000.0
  - - policy
    - CnnPolicy
  - - policy_kwargs
    - dict(optimizer_class=RMSpropTFLike, optimizer_kwargs=dict(eps=1e-5))
  - - vf_coef
    - 0.25

args.yml:

!!python/object/apply:collections.OrderedDict
- - - algo
    - a2c
  - - env
    - BreakoutNoFrameskip-v4
  - - env_kwargs
    - null
  - - eval_episodes
    - 5
  - - eval_freq
    - 10000
  - - gym_packages
    - []
  - - hyperparams
    - null
  - - log_folder
    - logs
  - - log_interval
    - -1
  - - n_eval_envs
    - 1
  - - n_evaluations
    - 20
  - - n_jobs
    - 1
  - - n_startup_trials
    - 10
  - - n_timesteps
    - -1
  - - n_trials
    - 10
  - - no_optim_plots
    - false
  - - num_threads
    - -1
  - - optimization_log_path
    - null
  - - optimize_hyperparameters
    - false
  - - pruner
    - median
  - - sampler
    - tpe
  - - save_freq
    - -1
  - - save_replay_buffer
    - false
  - - seed
    - 0
  - - storage
    - null
  - - study_name
    - null
  - - tensorboard_log
    - ''
  - - trained_agent
    - ''
  - - truncate_last_trajectory
    - true
  - - uuid
    - false
  - - vec_env
    - dummy
  - - verbose
    - 1

Issue Analytics

State:
Created 2 years ago
Comments:8 (1 by maintainers)

Top GitHub Comments

1reaction

qgallouedeccommented, Feb 10, 2022

As far as I know, you are indeed reporting a real bug. Varying the number of environments should have a limited impact on the result. For what it’s worth, I reproduced the same curve as you, which seems to confirm that we are not within the error bands.

1reaction

qgallouedeccommented, Feb 9, 2022

Have you updated ˋgardient_steps` properly?

Ans the documentation: https://stable-baselines3.readthedocs.io/en/master/guide/examples.html#multiprocessing-with-off-policy-algorithms

Top Results From Across the Web

What is the difference between NaN and None? - Stack Overflow

NaN is used as a placeholder for missing data consistently in pandas, consistency is good. I usually read/translate NaN as "missing".

What are the reasons for great performance differences ...

Performance measurement is a surprisingly hard topic and huge differences such as ... Using two different machines to compare a program is out...

Performance of PPO on Pong #185 - openai/baselines - GitHub

Hello, I can run_atari.py provided in ppo1 without experiencing any problem. However, the mean episodic reward remains at a level of 20+ ...

Observational Overfitting in Reinforcement Learning - Behnam ...

We discuss realistic instances where observational overfitting may occur and its difference from other confounding factors, and design a parametric ...

EnvPool, a highly parallel reinforcement learning environment ...

the RL environment simulation speed across different hardware ... of large-scale distributed systems and advanced AI chips like TPUs [16].