Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

SAC implementation is 2x slower than in stable-baselines

See original GitHub issue

Hello, First of all, thanks for working on this awesome project! I’ve tried to use the SAC implementation and noticed that it works much slower than TF1 version from stable-baselines. Here is the code for the minimal stable-baselines3 example:

import os

import gym
import torch
from stable_baselines3 import SAC
from stable_baselines3.sac.policies import MlpPolicy

os.environ['CUDA_VISIBLE_DEVICES'] = ''

torch.set_num_threads(2)

env = gym.make('Pendulum-v0')

model = SAC(MlpPolicy, env, verbose=1,
            buffer_size=int(1e6),
            batch_size=256,
            policy_kwargs={'net_arch': [256, 256],
                           'activation_fn': torch.nn.ReLU})
model.learn(total_timesteps=1000000, log_interval=10)

Here is corresponding stable-baselines (TF1) example:

import os

import gym
import tensorflow as tf
from stable_baselines import SAC
from stable_baselines.sac.policies import MlpPolicy

os.environ['CUDA_VISIBLE_DEVICES'] = ''

env = gym.make('Pendulum-v0')

model = SAC(MlpPolicy, env, verbose=1,
            buffer_size=int(1e6),
            batch_size=256,
            policy_kwargs={'layers': [256, 256], 'act_fun': tf.nn.relu},
            n_cpu_tf_sess=2)
model.learn(total_timesteps=1000000, log_interval=10)

I set the same architecture, number of updates, batch size. So seems all relevant stuff is set the same. However, for PyTorch version I get ~45 FPS, and for TF1 one ~90 FPS.

System Info Libraries are installed from pip, I have the newest stable-baselines and stable-baselines3, pytorch 1.5.1, tensorflow 1.15.0. I run on CPU. This was run on MacBook pro, I also got similar results on another Linux machine. Note that I also tried manipulating number of CPU cores, but even the best setting for PyTorch is still 2x slower.

Issue Analytics

State:
Created 3 years ago
Comments:11 (9 by maintainers)

Top GitHub Comments

2reactions

araffincommented, Jul 30, 2020

Update: after upgrading to pytorch 1.6, the gap seems to be filled: SB2 is only 1.02x faster than SB3

I updated the notebook accordingly.

@Miffyli that may interest you too 😉

EDIT: apparently on cpu only

0reactions

araffincommented, Mar 11, 2022

Pytorch 1.11 (with longer training for better comparison); “SB2 is 1.07x faster than SB3” (CPU, on colab) “SB2 is 1.52x faster than SB3” (GPU, on colab)