Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Question] Single Env vs SubprocVecEnv

See original GitHub issue

Important Note: We do not do technical support, nor consulting and don’t answer personal questions per email. Please post your question on the RL Discord, Reddit or Stack Overflow in that case.

Question

Hi, I am running my single environment learning using PPO with default hyperparameters. The total_timesteps for learning is set to 40_000 steps. If I want to use for example 8 envs simultaneously for collecting the experience from the env, in order to make the learning process equivalent to running a single env for 40k steps, would I set learning to 40k steps as well (as in single env) or to 40_000/8=5000 steps ?

Assuming same seed for the weight initialisation and for the env in case of single and multiple envs, I should obtain equivalent results?

In the hyperparameters the n_epoch=10, from my experience in the supervised learning the n_epoch is usually much bigger ex. 200-1000, whats the reason for choosing such a small number?

Additional context

Checklist

I have read the documentation (required)
I have checked that there is no similar issue in the repo (required)

Issue Analytics

State:
Created 2 years ago
Comments:9

Top GitHub Comments

1reaction

aaakhancommented, Dec 14, 2021

I see, so if I’m understanding this correctly then the results I got were because the agent had 150x3 steps, which led to more batches per rollout and therefore more updates.

Thank you very much I thought there was something inherently off in my environment or in gym (much less likely).

1reaction

Miffylicommented, Nov 15, 2021

With a single env you will have highly correlated samples in your rollout, since they are (likely) from one single episode, and is likely a bad representative of the dynamics of the full environment (since it is only a single episode). Doing updates on this data will bias the network to that one direction represented by that data.

With multiple envs you have a chance of having data from different episodes and different areas of the environment -> better representative of the full environment -> network updates are less biased, and generally, the training is more stable.

Top Results From Across the Web

What is the point of having DummyVecEnv if it is running ...

My question is DummyVecEnv vs just having a single env collecting data ... you use a DummyVecEnv with 4 envs or a SubprocVecEnv...

Vectorized Environments - Stable Baselines - Read the Docs

Vectorized Environments are a method for stacking multiple independent environments into a single environment. Instead of training an RL agent on 1 environment ......

SubprocVecEnv not working with Custom Env (Stable Baselines

Env as parent class and everything works well running single core. I've started the code as follows: class MyEnv(gym.Env): .... But if I...

Understanding OpenAI baseline source code and making it do ...

class SubprocVecEnv(VecEnv): """ VecEnv that runs multiple environments in parallel in subproceses and communicates with them via pipes.

Efficient Multiple Gym Environments - Squadrick

Before going off and using multiprocessing to optimize the performance, let's benchmark a single Gym environment. import gym env ...