Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Feature request] Adding multiprocessing support for off policy algorithms

See original GitHub issue

I am in the process of adding multiprocessing(vectorized envs) support for off-policy algorithms (TD3, SAC, DDPG etc), I’ve added support for sampling multiple actions and updates timesteps appropriately to the number of vectorized environments. The modified code can run without throwing an error, but the algorithms don’t really converge anymore. I tried on OpenAI Gym’s Pendulum-v0, where single instance envs made from make_vec_env('Pendulum-v0', n_envs = 1, vec_env_cls=DummyVecEnv) trains fine. If I specify multiple instances such as make_vec_env('Pendulum-v0', n_envs = 2, vec_env_cls=DummyVecEnv) or make_vec_env('Pendulum-v0', n_envs = 2, vec_env_cls=SubprocVecEnv), then the algorithms don’t converge at all. Here’s a warning message that I get, which I suspect is closely related to the non-convergence.

/home/me/code/stable-baselines3/stable_baselines3/sac/sac.py:237: UserWarning: Using a target size (torch.Size([256, 2])) that is different
to the input size (torch.Size([256, 1])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size.                                                                                                                                             critic_loss = 0.5 * sum([F.mse_loss(current_q, q_backup) for current_q in current_q_estimates])

It appears to me that the replay buffer wasn’t not retrieving n_envs samples thus the loss target had to rely on broadcasting

Some pointers on modifying the replay buffer so it would support multiprocessing would be much appreciated! If the authors would like, I can create a PR

https://github.com/yonkshi/stable-baselines3/commit/157971357a28be8435d09cfccec1d4258b220a6a

Issue Analytics

State:
Created 3 years ago
Comments:13 (7 by maintainers)

Top GitHub Comments

4reactions

araffincommented, May 17, 2021

I’ll be working on that in the coming weeks (I need to implement it for a personal project)

3reactions

araffincommented, Mar 3, 2022

“AssertionError: You must use only one env when doing episodic training.”

Please read the documentation, you are using train_freq=(1, "episode") (episodic training), to use mutliple env, you must use “step” as the unit (or ŧrain_freq=1 for short). We recommend you to use TD3/SAC anyway (improved versions of DDPG).

Also, I found out that new features such as EvalCallback with StopTrainingOnNoModelImprovement don’t exist in the library installed in my environment. Even though I could find the code in the repo.

You need to install master version (cf. doc) as it is not yet released.

Top Results From Across the Web

Speed Up Your Algorithms Part 3 — Parallel-ization

NOTE: This post goes with Jupyter Notebook available in my Repo on Github:[SpeedUpYourAlgorithms-Parallelization] · import multiprocessing as mp

Stable Baselines Documentation - Read the Docs

Stable Baselines is a set of improved implementations of Reinforcement Learning (RL) algorithms based on OpenAI. Baselines. Github repository: ...

Multithreading vs. Multiprocessing - Choosing the Right ...

Before we dive into the various considerations when choosing the parallel computing strategy that fits your needs (multithreading vs.

Python Multithreading and Multiprocessing Tutorial - Toptal

Threading is just one of the many ways concurrent programs can be built. In this article, we will take a look at threading...

Distributed multiprocessing.Pool — Ray 2.2.0

Ray supports running distributed python programs with the multiprocessing.Pool API using Ray Actors instead of local processes. This makes it easy to scale ......