Episodic length always divided by the number of environments for multiprocessing strangely
See original GitHub issueImportant Note: We do not do technical support, nor consulting and don’t answer personal questions per email. Please post your question on the RL Discord, Reddit or Stack Overflow in that case.
Question
Sorry to bother you. I’m using PPO for our task, but still have an issue regarding multiprocessing. To wrap a vector environment, I use make_vec_env
with n_envs=1
. And I set the episode length always to be 45 steps. And when n_envs=2
, the average episode length is also 45 steps which is under expectation. However, when I set the episode length to be 50 steps with n_envs=2
, the average episode length during training became 25 strangely.
So I test a few more cases and find that the average episode length is always equal to episode length/n_envs
while the trained performance is as well worse when n_envs
is larger than 1. From my perspective, episode length
should be a value independent of n_envs
. And I didn’t find any bugs after checking. However, the episodic length is divided by the number of processes during rollout. May I have your suggestions? Did I ignore anything in the settings?
Checklist
- [yes] I have read the documentation (required)
- [yes] I have checked that there is no similar issue in the repo (required)
Issue Analytics
- State:
- Created a year ago
- Comments:11 (2 by maintainers)
Top GitHub Comments
Hmm hmm I do not see how that would “fix” the problem: your environments might be reading/writing from/to a shared variable, which is causing the problems with
DummyVecEnv
(all environments live in the same process). InSubProcVecEnv
, each environment lives in a separate process and you probably do not have multiple envs using same variables.But to answer your question: No, no need to change a thing! Different implementations of
VecEnv
should act identically when it comes to training, apart from what they do underneath.Got it. Really appreciate the great help!