Roadmap to Stable-Baselines3 V1.0

This issue is meant to be updated as the list of changes is not exhaustive

Dear all,

Stable-Baselines3 beta is now out 🎉 ! This issue is meant to reference what is implemented and what is missing before a first major version.

As mentioned in the README, before v1.0, breaking changes may occur. I would like to encourage contributors (especially the maintainers) to make comments on how to improve the library before v1.0 (and maybe make some internal changes).

I will try to review the features mentioned in https://github.com/hill-a/stable-baselines/issues/576 (and https://github.com/hill-a/stable-baselines/issues/733) and I will create issues soon to reference what is missing.

What is implemented?

basic features (training/saving/loading/predict)
basic set of algorithms (A2C/PPO/SAC/TD3)
basic pre-processing (Box and Discrete observation/action spaces are handled)
callback support
complete benchmark for the continuous action case
basic rl zoo for training/evaluating plotting (https://github.com/DLR-RM/rl-baselines3-zoo)
consistent api
basic tests and most type hints
continuous integration (I’m in discussion with the organization admins for that)
handle more observation/action spaces #4 and #5 (thanks @rolandgvc)
tensorboard integration #9 (thanks @rolandgvc)
basic documentation and notebooks
automatic build of the documentation
Vanilla DQN #6 (thanks @Artemis-Skade)
Refactor off-policy critics to reduce code duplication #3 (see #78 )
DDPG #3
do a complete benchmark for the discrete case #49 (thanks @Miffyli !)
performance check for continuous actions #48 (even better than gSDE paper)
get/set parameters for the base class (#138 )
clean up type-hints in docs #10 (cumbersome to read)
documenting the migration between SB and SB3 #11
finish typing some methods #175
HER #8 (thanks @megan-klaiber)
finishing to update and clean the doc #166 (help is wanted)
finishing to update the notebooks and the tutorial #7 (I will do that, only HER notebook missing)

What are the new features?

much cleaner base code (and no more warnings =D )
independent saving/loading/predict for policies
State-Dependent Exploration (SDE) for using RL directly on real robots (this is a unique feature, it was the starting point of SB3, I published a paper on that: https://arxiv.org/abs/2005.05719)
proper evaluation (using separate env) is included in the base class (using EvalCallback)
all environments are VecEnv
better saving/loading (now can include the replay buffer and the optimizers)
any number of critics are allowed for SAC/TD3
custom actor/critic net arch for off-policy algos (#113 )
QR-DQN in SB3-Contrib
Truncated Quantile Critics (TQC) (see #83 ) in SB3-Contrib
@Miffyli suggested a “contrib” repo for experimental features (it is here)

What is missing?

syncing some files with Stable-Baselines to remain consistent (we may be good now, but need to be checked)
finish code-review of exisiting code #17

Checklist for v1.0 release

Update Readme
Prepare blog post
Update doc: add links to the stable-baselines3 contrib
Update docker image to use newer Ubuntu version
Populate RL zoo

What is next? (for V1.1+)

basic dict/tuple support for observations (#243 )
simple recurrent policies? (https://github.com/DLR-RM/stable-baselines3/issues/18)
DQN extensions (double, PER, IQN) (https://github.com/DLR-RM/stable-baselines3/issues/622)
Implement TRPO (https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/pull/40)
multi-worker training for all algorithms (#179 )
n-step returns for off-policy algorithms #47 (@PartiallyTyped )
SAC discrete #157 (need to be discussed, benefit vs DQN+extensions?)
Energy Based Prioritisation? (@RyanRizzo96)
implement action_proba in the base class?
test the doc snippets #14 (help is welcomed)
noisy networks (https://arxiv.org/abs/1706.10295) @PartiallyTyped ? exploration in parameter space? (https://github.com/DLR-RM/stable-baselines3/issues/622)
Munchausen Reinforcement Learning (MDQN) (probably in the contrib first, e.g. https://github.com/pfnet/pfrl/pull/74)

side note: should we change the default start_method to fork? (now that we don’t have tf anymore)

Issue Analytics

State:
Created 3 years ago
Reactions:27
Comments:46 (29 by maintainers)

Top GitHub Comments

4reactions

PartiallyTypedcommented, May 11, 2020

Perhaps an official shorthand for stable-baselines and stable-baselines3 e.g. sb and sb3?


import stable_baselines3 as sb3

4reactions

araffincommented, May 11, 2020

for visualization, probably using something like weights & biases (https://www.wandb.com/) is an option?

correct me if I’m wrong but W&B does not work offline, no? This is really important as you don’t want your results to be published when you do private work.

This could be also implemented either as a callback (cf doc) or a new output for the logger. But sounds more like a “contrib” module to me.

Top Results From Across the Web

Reliable Reinforcement Learning Implementations

Stable Baselines3 (SB3) is a set of reliable implementations of reinforcement learning algorithms in PyTorch. It is the next major version of Stable ......

[P] Stable-Baselines3 v1.0 - Reliable implementations of RL ...

After several months of beta, we are happy to announce the release of Stable-Baselines3 (SB3) v1.0, a set of reliable implementations of ...

Stable Baselines Documentation - Read the Docs

Note: Stable-Baselines supports Tensorflow versions from 1.8.0 to 1.15.0 ... menting evolution strategy for solving CartPole-v1 environment.

Reinforcement Learning (RL) Quick Start with SB3 - Kaggle

The programmer cannot predict everything that could happen on the road. ... You can read a detailed presentation of Stable Baselines3 in the...

DLR-RM/stable-baselines3 - [REPO]@Telematika

Awesome Repositories Collection | DLR-RM/stable-baselines3. ... Roadmap to V1.0. Please look at the issue for more details.