question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

V3 new backend: PyTorch? and the future of Stable Baselines

See original GitHub issue

Version3 is now online: https://github.com/DLR-RM/stable-baselines3

This issue summarizes the discussion between the maintainers (@hill-a , @erniejunior , @AdamGleave , @Miffyli and I) about the next backend and the future of stable baselines.

First, we recommend anyone to read the summary of design choices in #576

Backend Choice

This is the biggest design choice for next major version. In any case, we will drop tensorflow 1 for something else, among the candidate we have: pytorch, tensorflow 2, jax.

Maintainers opinion

The majority of the maintainers would favor PyTorch as they already work with it and the rest don’t have strong feelings as they will have to switch to a new framework anyway.

As a transition, here is the final results from the poll I created some weeks ago on twitter: Number of views: 4500 Votes: 319 (quite a lot!) Results:

  • PyTorch - 69.9%
  • Tensorflow 2 - 13.8%
  • Jax - 9.4%
  • Does not matter - 6.9%

Disclaimer: doing a poll on Twitter restricts the audience but it’s a good start

Tensorflow 2

Pros:

  • natural continuation from tf1 (although we don’t plan to use the compat module), at least for our users
  • the eager mode is easy to use (especially numpy <-> tf conversion)
  • docs are better than tf1
  • native tensorboard support

Cons:

  • docs are better but remain messy (still three ways of writing the same thing, e.g. MSE loss)
  • tf.function can be tricky
  • early version
  • not sure that the tf1 community will follow, as it requires breaking changes anyway

Jax

Pros:

  • good design choices (e.g. to avoid side effects)
  • getting a lot of popularity recently
  • great potential
  • computation of higher order derivates (ex for meta RL)

Cons:

  • early stage of development
  • the eco-system is not ready yet (e.g. only experimental version of neural net lib)
  • none of the maintainers has experience with it, this would require more time

PyTorch

Pros:

  • the community/demand is growing
  • good documentation
  • good api
  • nice c++ frontend/ easy export
  • several companies switched to PyTorch (Chainer too)
  • I already have an internal (and working) pytorch version of Stable Baselines
  • the eco-system / api is now fairly stable/mature

Cons:

  • already a lot of library for RL using pytorch
  • tensorboard would be an optional dependency (because it requires tf) even tough pytorch now supports it

Side note: although the twitter poll is biased, the gap between first and second choice is striking.

Summary

As a summary, the first choice for the backend would be PyTorch for mainly two reasons:

  • community (most people use or want pytorch now)
  • 2+ maintainers would favor it vs the rest being neutral

A second choice would be Jax because:

  • potential impact and growing community
  • almost equal popularity currently vs tf2

It seems that tensorflow 2 does not convince much people because it is a completely new framework (compared to tf1, even if it shares the name) but is fairly new and compared to PyTorch. It seems to have the same features but with less maturity.

Future of Stable-Baselines

PyTorch version

I currently have an internal PyTorch version of Stable Baselines, codename “Torchy Baselines” (and its zoo), that I use for my research (RL for robotics). It already has a working version of A2C, PPO, SAC and TD3.

I dropped python 3.5 support in order to use f-strings, more typing and have no issues with dicts. Python 3.5 end of life is coming soon anyway.

We agree with the other maintainers that this will be a good starting point but with some conditions:

  • I will remove all “research-specific” code (it will be in a separate branch)
  • the license should be permissive (MIT if possible)

Release date

The plan is to release an early version (and its zoo) as soon as possible (in the next two months, so before the end of April).

New name

Because of the big changes and also because it will be released under the DLR-RM team, we will update the name of the library: Stable-Baselines3 will be its new name (so we keep the Stable Baselines name while having a different package to show the huge internal change)

V2 support

The plan (as soon as the V3 is released) would be to do only bug fixes for v2 for 6 months. We will give more details on that later.

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Reactions:25
  • Comments:10

github_iconTop GitHub Comments

1reaction
crobarcrocommented, Apr 28, 2020

@araffin thanks, understood, in the mean time, the discussion @Miffyli pointed me to, with your pytorch conversion, has actually got me a long way towards ONNX export via pytorch which was my ultimate goal. I will report back with an example script once I have it streamlined.

0reactions
araffincommented, Mar 2, 2021
Read more comments on GitHub >

github_iconTop Results From Across the Web

pdf
Stable Baselines3 (SB3) is a set of reliable implementations of reinforcement learning algorithms in PyTorch. It is the.
Read more >
Antonin Raffin on Twitter
Dear RL Twitter, we are having a discussion with the Stable Baselines maintainers about its future, and we need your opinion.
Read more >
Stable Baselines Documentation
Stable Baselines is a set of improved implementations of Reinforcement Learning (RL) algorithms based on OpenAI. Baselines. Github repository: ...
Read more >
Everything you need to know about TorchVision's ...
An open source machine learning framework that accelerates the path from research prototyping to production deployment.
Read more >
Reinforcement learning example with stable-baselines
We first show how to install the relevant toolboxes. We then show how build the task of interest (in the example the RDM...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found