question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Roadmap to Stable-Baselines3 V1.0

See original GitHub issue

This issue is meant to be updated as the list of changes is not exhaustive

Dear all,

Stable-Baselines3 beta is now out 🎉 ! This issue is meant to reference what is implemented and what is missing before a first major version.

As mentioned in the README, before v1.0, breaking changes may occur. I would like to encourage contributors (especially the maintainers) to make comments on how to improve the library before v1.0 (and maybe make some internal changes).

I will try to review the features mentioned in https://github.com/hill-a/stable-baselines/issues/576 (and https://github.com/hill-a/stable-baselines/issues/733) and I will create issues soon to reference what is missing.

What is implemented?

  • basic features (training/saving/loading/predict)
  • basic set of algorithms (A2C/PPO/SAC/TD3)
  • basic pre-processing (Box and Discrete observation/action spaces are handled)
  • callback support
  • complete benchmark for the continuous action case
  • basic rl zoo for training/evaluating plotting (https://github.com/DLR-RM/rl-baselines3-zoo)
  • consistent api
  • basic tests and most type hints
  • continuous integration (I’m in discussion with the organization admins for that)
  • handle more observation/action spaces #4 and #5 (thanks @rolandgvc)
  • tensorboard integration #9 (thanks @rolandgvc)
  • basic documentation and notebooks
  • automatic build of the documentation
  • Vanilla DQN #6 (thanks @Artemis-Skade)
  • Refactor off-policy critics to reduce code duplication #3 (see #78 )
  • DDPG #3
  • do a complete benchmark for the discrete case #49 (thanks @Miffyli !)
  • performance check for continuous actions #48 (even better than gSDE paper)
  • get/set parameters for the base class (#138 )
  • clean up type-hints in docs #10 (cumbersome to read)
  • documenting the migration between SB and SB3 #11
  • finish typing some methods #175
  • HER #8 (thanks @megan-klaiber)
  • finishing to update and clean the doc #166 (help is wanted)
  • finishing to update the notebooks and the tutorial #7 (I will do that, only HER notebook missing)

What are the new features?

  • much cleaner base code (and no more warnings =D )
  • independent saving/loading/predict for policies
  • State-Dependent Exploration (SDE) for using RL directly on real robots (this is a unique feature, it was the starting point of SB3, I published a paper on that: https://arxiv.org/abs/2005.05719)
  • proper evaluation (using separate env) is included in the base class (using EvalCallback)
  • all environments are VecEnv
  • better saving/loading (now can include the replay buffer and the optimizers)
  • any number of critics are allowed for SAC/TD3
  • custom actor/critic net arch for off-policy algos (#113 )
  • QR-DQN in SB3-Contrib
  • Truncated Quantile Critics (TQC) (see #83 ) in SB3-Contrib
  • @Miffyli suggested a “contrib” repo for experimental features (it is here)

What is missing?

  • syncing some files with Stable-Baselines to remain consistent (we may be good now, but need to be checked)
  • finish code-review of exisiting code #17

Checklist for v1.0 release

  • Update Readme
  • Prepare blog post
  • Update doc: add links to the stable-baselines3 contrib
  • Update docker image to use newer Ubuntu version
  • Populate RL zoo

What is next? (for V1.1+)

side note: should we change the default start_method to fork? (now that we don’t have tf anymore)

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:27
  • Comments:46 (29 by maintainers)

github_iconTop GitHub Comments

4reactions
PartiallyTypedcommented, May 11, 2020

Perhaps an official shorthand for stable-baselines and stable-baselines3 e.g. sb and sb3?


import stable_baselines3 as sb3
4reactions
araffincommented, May 11, 2020

for visualization, probably using something like weights & biases (https://www.wandb.com/) is an option?

correct me if I’m wrong but W&B does not work offline, no? This is really important as you don’t want your results to be published when you do private work.

This could be also implemented either as a callback (cf doc) or a new output for the logger. But sounds more like a “contrib” module to me.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Reliable Reinforcement Learning Implementations
Stable Baselines3 (SB3) is a set of reliable implementations of reinforcement learning algorithms in PyTorch. It is the next major version of Stable ......
Read more >
[P] Stable-Baselines3 v1.0 - Reliable implementations of RL ...
After several months of beta, we are happy to announce the release of Stable-Baselines3 (SB3) v1.0, a set of reliable implementations of ...
Read more >
Stable Baselines Documentation - Read the Docs
Note: Stable-Baselines supports Tensorflow versions from 1.8.0 to 1.15.0 ... menting evolution strategy for solving CartPole-v1 environment.
Read more >
Reinforcement Learning (RL) Quick Start with SB3 - Kaggle
The programmer cannot predict everything that could happen on the road. ... You can read a detailed presentation of Stable Baselines3 in the...
Read more >
DLR-RM/stable-baselines3 - [REPO]@Telematika
Awesome Repositories Collection | DLR-RM/stable-baselines3. ... Roadmap to V1.0. Please look at the issue for more details.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found