SubprocVecEnv performance compared to gym.vector.async_vector_env
See original GitHub issueHi,
I’m trying to use SubProcVecEnv
to create a vectorized environment and use it in my own PPO implementation. I have a couple of questions about the performance of this vectorization and hyperparameters.
I have a 28 cores CPU on my system and an RTX 2080Ti GPU. When I use gym.vector.async_vector_env
to create vectorized envs, it is 3 to 6 times faster than SubProcVecEnv
from stable_baselines3.
In SubProcVecEnv
, when I set the number of threads using torch.set_num_threads(28)
all the cores are involved but again it is almost two times slower than using torch.set_num_threads(10)
.
I also did all the comparisons by creating 100 parallel envs. I’m not sure how would I set this number of envs and torch number of threads. I think this slower performance compared to gym.vector.async_vector_env
is because of the bad hyperparameters I used.
What are the parameters I can use to get the best performance? which parameters are the most important ones?
Thank you
Issue Analytics
- State:
- Created 3 years ago
- Comments:9 (2 by maintainers)
Top GitHub Comments
We have a tutorial about that 😉 See notebook 3: https://github.com/araffin/rl-tutorial-jnrr19#content
Yes, using more environments (with same n_step -> more samples) is expected to result in stabler and/or faster learning, sometimes even in terms of env steps. I tried to look for a paper that had experiments on this very topic but can not find it for the life of me. Closest thing I have to share is the OpenAI Dota 2 paper, where in Figure 5 they compare different batch sizes.
The training times should be minuscule and only have a real effect on training speed if you can reach thousands of FPS with your environment (e.g. basic control tasks).