SAC implementation is 2x slower than in stable-baselines
See original GitHub issueHello, First of all, thanks for working on this awesome project! I’ve tried to use the SAC implementation and noticed that it works much slower than TF1 version from stable-baselines. Here is the code for the minimal stable-baselines3 example:
import os
import gym
import torch
from stable_baselines3 import SAC
from stable_baselines3.sac.policies import MlpPolicy
os.environ['CUDA_VISIBLE_DEVICES'] = ''
torch.set_num_threads(2)
env = gym.make('Pendulum-v0')
model = SAC(MlpPolicy, env, verbose=1,
buffer_size=int(1e6),
batch_size=256,
policy_kwargs={'net_arch': [256, 256],
'activation_fn': torch.nn.ReLU})
model.learn(total_timesteps=1000000, log_interval=10)
Here is corresponding stable-baselines (TF1) example:
import os
import gym
import tensorflow as tf
from stable_baselines import SAC
from stable_baselines.sac.policies import MlpPolicy
os.environ['CUDA_VISIBLE_DEVICES'] = ''
env = gym.make('Pendulum-v0')
model = SAC(MlpPolicy, env, verbose=1,
buffer_size=int(1e6),
batch_size=256,
policy_kwargs={'layers': [256, 256], 'act_fun': tf.nn.relu},
n_cpu_tf_sess=2)
model.learn(total_timesteps=1000000, log_interval=10)
I set the same architecture, number of updates, batch size. So seems all relevant stuff is set the same. However, for PyTorch version I get ~45 FPS, and for TF1 one ~90 FPS.
System Info Libraries are installed from pip, I have the newest stable-baselines and stable-baselines3, pytorch 1.5.1, tensorflow 1.15.0. I run on CPU. This was run on MacBook pro, I also got similar results on another Linux machine. Note that I also tried manipulating number of CPU cores, but even the best setting for PyTorch is still 2x slower.
Issue Analytics
- State:
- Created 3 years ago
- Comments:11 (9 by maintainers)
Top GitHub Comments
Update: after upgrading to pytorch 1.6, the gap seems to be filled:
SB2 is only 1.02x faster than SB3
I updated the notebook accordingly.
@Miffyli that may interest you too 😉
EDIT: apparently on cpu only
Pytorch 1.11 (with longer training for better comparison); “SB2 is 1.07x faster than SB3” (CPU, on colab) “SB2 is 1.52x faster than SB3” (GPU, on colab)