question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Feature request] Adding multiprocessing support for off policy algorithms

See original GitHub issue

I am in the process of adding multiprocessing(vectorized envs) support for off-policy algorithms (TD3, SAC, DDPG etc), I’ve added support for sampling multiple actions and updates timesteps appropriately to the number of vectorized environments. The modified code can run without throwing an error, but the algorithms don’t really converge anymore. I tried on OpenAI Gym’s Pendulum-v0, where single instance envs made from make_vec_env('Pendulum-v0', n_envs = 1, vec_env_cls=DummyVecEnv) trains fine. If I specify multiple instances such as make_vec_env('Pendulum-v0', n_envs = 2, vec_env_cls=DummyVecEnv) or make_vec_env('Pendulum-v0', n_envs = 2, vec_env_cls=SubprocVecEnv), then the algorithms don’t converge at all. Here’s a warning message that I get, which I suspect is closely related to the non-convergence.

/home/me/code/stable-baselines3/stable_baselines3/sac/sac.py:237: UserWarning: Using a target size (torch.Size([256, 2])) that is different
to the input size (torch.Size([256, 1])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size.                                                                                                                                             critic_loss = 0.5 * sum([F.mse_loss(current_q, q_backup) for current_q in current_q_estimates])   

It appears to me that the replay buffer wasn’t not retrieving n_envs samples thus the loss target had to rely on broadcasting

Some pointers on modifying the replay buffer so it would support multiprocessing would be much appreciated! If the authors would like, I can create a PR

https://github.com/yonkshi/stable-baselines3/commit/157971357a28be8435d09cfccec1d4258b220a6a

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:13 (7 by maintainers)

github_iconTop GitHub Comments

4reactions
araffincommented, May 17, 2021

I’ll be working on that in the coming weeks (I need to implement it for a personal project)

3reactions
araffincommented, Mar 3, 2022

“AssertionError: You must use only one env when doing episodic training.”

Please read the documentation, you are using train_freq=(1, "episode") (episodic training), to use mutliple env, you must use “step” as the unit (or ŧrain_freq=1 for short). We recommend you to use TD3/SAC anyway (improved versions of DDPG).

Also, I found out that new features such as EvalCallback with StopTrainingOnNoModelImprovement don’t exist in the library installed in my environment. Even though I could find the code in the repo.

You need to install master version (cf. doc) as it is not yet released.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Speed Up Your Algorithms Part 3 — Parallel-ization
NOTE: This post goes with Jupyter Notebook available in my Repo on Github:[SpeedUpYourAlgorithms-Parallelization] · import multiprocessing as mp
Read more >
Stable Baselines Documentation - Read the Docs
Stable Baselines is a set of improved implementations of Reinforcement Learning (RL) algorithms based on OpenAI. Baselines. Github repository: ...
Read more >
Multithreading vs. Multiprocessing - Choosing the Right ...
Before we dive into the various considerations when choosing the parallel computing strategy that fits your needs (multithreading vs.
Read more >
Python Multithreading and Multiprocessing Tutorial - Toptal
Threading is just one of the many ways concurrent programs can be built. In this article, we will take a look at threading...
Read more >
Distributed multiprocessing.Pool — Ray 2.2.0
Ray supports running distributed python programs with the multiprocessing.Pool API using Ray Actors instead of local processes. This makes it easy to scale ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found