[Bug] Large drop in mean reward using multiprocessing with make_vec_env and SAC
See original GitHub issue🐛 Bug
Large drop in mean reward using multiprocessing with make_vec_env and SAC
To Reproduce
See code below (modified from Colab example, using latest SB3)
# Install stable-baselines latest version
!pip install git+https://github.com/DLR-RM/stable-baselines3#egg=stable-baselines3[extra]
import gym
from stable_baselines3.common.env_util import make_vec_env
from stable_baselines3 import SAC, PPO
from stable_baselines3.common.evaluation import evaluate_policy
import time
def compare_multi_process(model_choice, env_id, n_timesteps, num_cpu=6, policy_type='MlpPolicy'):
eval_env = gym.make(env_id) # environment for evaluation
vec_env = make_vec_env(env_id, n_envs=num_cpu) # num_cpu = number of processes to use
# Multiprocessed RL Training
multi_model = model_choice(policy_type, vec_env, verbose=0)
model_name = type(multi_model).__name__ # save model name as string for print
start_time = time.time()
multi_model.learn(n_timesteps)
total_time_multi = time.time() - start_time
print(f"\nTook {total_time_multi:.2f}s for {model_name}_multi - {n_timesteps / total_time_multi:.2f} FPS")
# Single Process RL Training
non_multi_model = model_choice(policy_type, env_id, verbose=0)
start_time = time.time()
non_multi_model.learn(n_timesteps)
total_time_single = time.time() - start_time
print(f"Took {total_time_single:.2f}s for {model_name}_non_multi - {n_timesteps / total_time_single:.2f} FPS")
print("Multiprocessed training is {:.2f}x faster!".format(total_time_single / total_time_multi))
# Evaluate the trained agents
mean_reward, std_reward = evaluate_policy(multi_model, eval_env, n_eval_episodes=10)
print(f'Mean reward: {model_name}_multi, {mean_reward} +/- {std_reward:.2f}')
mean_reward, std_reward = evaluate_policy(non_multi_model, eval_env, n_eval_episodes=10)
print(f'Mean reward: {model_name}_non_multi, {mean_reward} +/- {std_reward:.2f}')
n_timesteps = 10000
env_id = "Pendulum-v0"
# env_id = "MountainCarContinuous-v0"
compare_multi_process(SAC, env_id, n_timesteps)
compare_multi_process(PPO, env_id, n_timesteps)
Traceback
Took 18.65s for SAC_multi - 536.27 FPS
Took 110.15s for SAC_non_multi - 90.79 FPS
Multiprocessed training is 5.91x faster!
Mean reward: SAC_multi, -1205.6901091783773 +/- 106.64
Mean reward: SAC_non_multi, -190.3052033510613 +/- 55.82
Took 11.11s for PPO_multi - 899.92 FPS
Took 18.54s for PPO_non_multi - 539.47 FPS
Multiprocessed training is 1.67x faster!
Mean reward: PPO_multi, -1111.555600091256 +/- 332.89
Mean reward: PPO_non_multi, -1040.340094899712 +/- 249.71
Expected behavior
We notice a small decrease in mean reward (~10%) for PPO, which is expected. SAC however is a huge decrease in mean reward (600%+). That is quite extreme. I have tested in other envs such as “MountainCarContinuous-v0” with similar results.
### System Info
Colab with GPU. Latest Chrome.
Checklist
- I have checked that there is no similar issue in the repo (required)
- I have read the documentation (required)
- I have provided a minimal working example to reproduce the bug (required)
Issue Analytics
- State:
- Created 2 years ago
- Comments:13 (3 by maintainers)
Top Results From Across the Web
Why your multiprocessing Pool is stuck (it's full of sharks!)
You're using multiprocessing to run some code across multiple processes, and it just—sits there. It's stuck.
Read more >problem using multiprocessing with really big objects?
msg185344 ‑ (view) Author: mrjbq7 (mrjbq7) Date: 2013‑03‑27 15:52
msg185345 ‑ (view) Author: Antoine Pitrou (pitrou) * Date: 2013‑03‑27 16:00
msg185351 ‑ (view) Author: Richard...
Read more >Stable Baselines Documentation - Read the Docs
That is to say, you can observe during training a huge drop in performance. This behavior is particularly present in DDPG, that's why...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I really appreciate your help! Sure, if I could find anything, I’d be happy to leave the issue (though I think it’s rather hard, as SB3 is such perfect work!)
Probably similar to https://github.com/DLR-RM/stable-baselines3/pull/654#issuecomment-997366746 the answer is in the doc, see warning in https://stable-baselines3.readthedocs.io/en/master/guide/examples.html#multiprocessing-with-off-policy-algorithms