[Question] Single Env vs SubprocVecEnv
See original GitHub issueImportant Note: We do not do technical support, nor consulting and don’t answer personal questions per email. Please post your question on the RL Discord, Reddit or Stack Overflow in that case.
Question
Hi,
I am running my single environment learning using PPO with default hyperparameters.
The total_timesteps
for learning is set to 40_000
steps. If I want to use for example 8 envs simultaneously for collecting the experience from the env, in order to make the learning process equivalent to running a single env for 40k steps, would I set learning to 40k steps as well (as in single env) or to 40_000/8=5000 steps
?
Assuming same seed for the weight initialisation and for the env in case of single and multiple envs, I should obtain equivalent results?
In the hyperparameters the n_epoch=10
, from my experience in the supervised learning the n_epoch is usually much bigger ex. 200-1000, whats the reason for choosing such a small number?
Additional context
Checklist
- I have read the documentation (required)
- I have checked that there is no similar issue in the repo (required)
Issue Analytics
- State:
- Created 2 years ago
- Comments:9
Top GitHub Comments
I see, so if I’m understanding this correctly then the results I got were because the agent had 150x3 steps, which led to more batches per rollout and therefore more updates.
Thank you very much I thought there was something inherently off in my environment or in gym (much less likely).
With a single env you will have highly correlated samples in your rollout, since they are (likely) from one single episode, and is likely a bad representative of the dynamics of the full environment (since it is only a single episode). Doing updates on this data will bias the network to that one direction represented by that data.
With multiple envs you have a chance of having data from different episodes and different areas of the environment -> better representative of the full environment -> network updates are less biased, and generally, the training is more stable.