Roadmap to Stable-Baselines3 V1.0
See original GitHub issueThis issue is meant to be updated as the list of changes is not exhaustive
Dear all,
Stable-Baselines3 beta is now out 🎉 ! This issue is meant to reference what is implemented and what is missing before a first major version.
As mentioned in the README, before v1.0, breaking changes may occur. I would like to encourage contributors (especially the maintainers) to make comments on how to improve the library before v1.0 (and maybe make some internal changes).
I will try to review the features mentioned in https://github.com/hill-a/stable-baselines/issues/576 (and https://github.com/hill-a/stable-baselines/issues/733) and I will create issues soon to reference what is missing.
What is implemented?
- basic features (training/saving/loading/predict)
- basic set of algorithms (A2C/PPO/SAC/TD3)
- basic pre-processing (Box and Discrete observation/action spaces are handled)
- callback support
- complete benchmark for the continuous action case
- basic rl zoo for training/evaluating plotting (https://github.com/DLR-RM/rl-baselines3-zoo)
- consistent api
- basic tests and most type hints
- continuous integration (I’m in discussion with the organization admins for that)
- handle more observation/action spaces #4 and #5 (thanks @rolandgvc)
- tensorboard integration #9 (thanks @rolandgvc)
- basic documentation and notebooks
- automatic build of the documentation
- Vanilla DQN #6 (thanks @Artemis-Skade)
- Refactor off-policy critics to reduce code duplication #3 (see #78 )
- DDPG #3
- do a complete benchmark for the discrete case #49 (thanks @Miffyli !)
- performance check for continuous actions #48 (even better than gSDE paper)
- get/set parameters for the base class (#138 )
- clean up type-hints in docs #10 (cumbersome to read)
- documenting the migration between SB and SB3 #11
- finish typing some methods #175
- HER #8 (thanks @megan-klaiber)
- finishing to update and clean the doc #166 (help is wanted)
- finishing to update the notebooks and the tutorial #7 (I will do that, only HER notebook missing)
What are the new features?
- much cleaner base code (and no more warnings =D )
- independent saving/loading/predict for policies
- State-Dependent Exploration (SDE) for using RL directly on real robots (this is a unique feature, it was the starting point of SB3, I published a paper on that: https://arxiv.org/abs/2005.05719)
- proper evaluation (using separate env) is included in the base class (using
EvalCallback
) - all environments are
VecEnv
- better saving/loading (now can include the replay buffer and the optimizers)
- any number of critics are allowed for SAC/TD3
- custom actor/critic net arch for off-policy algos (#113 )
- QR-DQN in SB3-Contrib
- Truncated Quantile Critics (TQC) (see #83 ) in SB3-Contrib
- @Miffyli suggested a “contrib” repo for experimental features (it is here)
What is missing?
- syncing some files with Stable-Baselines to remain consistent (we may be good now, but need to be checked)
- finish code-review of exisiting code #17
Checklist for v1.0 release
- Update Readme
- Prepare blog post
- Update doc: add links to the stable-baselines3 contrib
- Update docker image to use newer Ubuntu version
- Populate RL zoo
What is next? (for V1.1+)
- basic dict/tuple support for observations (#243 )
- simple recurrent policies? (https://github.com/DLR-RM/stable-baselines3/issues/18)
- DQN extensions (double, PER, IQN) (https://github.com/DLR-RM/stable-baselines3/issues/622)
- Implement TRPO (https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/pull/40)
- multi-worker training for all algorithms (#179 )
- n-step returns for off-policy algorithms #47 (@PartiallyTyped )
- SAC discrete #157 (need to be discussed, benefit vs DQN+extensions?)
- Energy Based Prioritisation? (@RyanRizzo96)
- implement
action_proba
in the base class? - test the doc snippets #14 (help is welcomed)
- noisy networks (https://arxiv.org/abs/1706.10295) @PartiallyTyped ? exploration in parameter space? (https://github.com/DLR-RM/stable-baselines3/issues/622)
- Munchausen Reinforcement Learning (MDQN) (probably in the contrib first, e.g. https://github.com/pfnet/pfrl/pull/74)
side note: should we change the default start_method
to fork
? (now that we don’t have tf anymore)
Issue Analytics
- State:
- Created 3 years ago
- Reactions:27
- Comments:46 (29 by maintainers)
Top GitHub Comments
Perhaps an official shorthand for
stable-baselines
andstable-baselines3
e.g.sb
andsb3
?correct me if I’m wrong but W&B does not work offline, no? This is really important as you don’t want your results to be published when you do private work.
This could be also implemented either as a callback (cf doc) or a new output for the logger. But sounds more like a “contrib” module to me.