question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Performance check (Continuous Actions)

See original GitHub issue

Check that the algorithms reach expected performance. This was already done prior to v0.5 for the gSDE paper but as we made big changes, it is good to check that again.

SB2 vs SB3 (Tensorflow Stable-Baselines vs Pytorch Stable-Baselines3)

  • A2C (6 seeds)

a2c.pdf a2c_ant.pdf a2c_half.pdf a2c_hopper.pdf a2c_walker.pdf

  • PPO (6 seeds)

ppo.pdf ant_ppo.pdf half_ppo.pdf hopper_ppo.pdf ppo_walker.pdf

  • SAC (3 seeds)

sac.pdf sac_ant.pdf sac_half.pdf sac_hopper.pdf sac_walker.pdf

  • TD3 (3 seeds)

td3.pdf td3_ant.pdf td3_half.pdf td3_hopper.pdf td3_walker.pdf

See https://paperswithcode.com/paper/generalized-state-dependent-exploration-for for the score that should be reached in 1M (off-policy) or 2M steps (on-policy).

Test envs; PyBullet Envs

Tested with version 0.8.0 (feat/perf-check branch in the two zoos)

SB3 commit hash: cceffd5ab2c855c6369ca88f70f9d3df11128b5b rl-zoo commit hash: 99f7dd0321c5beea1a0d775ad6bc043d41f3e2db

Environments A2C A2C PPO PPO SAC SAC TD3 TD3
SB2 SB3 SB2 SB3 SB2 SB3 SB2 SB3
HalfCheetah 1859 +/- 161 2003 +/- 54 2186 +/- 260 1976 +/- 479 2833 +/- 21 2757 +/- 53 2530 +/- 141 2774 +/- 35
Ant 2155 +/- 237 2286 +/- 72 2383 +/- 284 2364 +/- 120 3349 +/- 60 3146 +/- 35 3368 +/- 125 3305 +/- 43
Hopper 1457 +/- 75 1627 +/- 158 1166 +/- 287 1567 +/- 339 2391 +/- 238 2422 +/- 168 2542 +/- 79 2429 +/- 126
Walker2D 689 +/- 59 577 +/- 65 1117 +/- 121 1230 +/- 147 2202 +/- 45 2184 +/- 54 1686 +/- 584 2063 +/- 185

Generalized State-Dependent Exploration (gSDE)

See https://paperswithcode.com/paper/generalized-state-dependent-exploration-for for the score that should be reached in 1M (off-policy) or 2M steps (on-policy).

  • on policy (2M steps, 6 seeds):

gsde_onpolicy.pdf gsde_onpolicy_ant.pdf gsde_onpolicy_half.pdf gsde_onpolicy_hopper.pdf gsde_onpolicy_walker.pdf

  • off-policy (1M steps, 3 seeds):

gsde_off_policy.pdf gsde_offpolicy_ant.pdf gsde_offpolicy_half.pdf gsde_offpolicy_hopper.pdf gsde_offpolicy_walker.pdf

SB3 commit hash: b948b7fd5c3ff18bf52d3945111c304e6205c64f rl zoo commit hash: b56c1470c9a958c196f60e814de893050e2469f0

Environments A2C A2C PPO PPO SAC SAC TD3 TD3
Gaussian gSDE Gaussian gSDE Gaussian gSDE Gaussian gSDE
HalfCheetah 2003 +/- 54 2032 +/- 122 1976 +/- 479 2826 +/- 45 2757 +/- 53 2984 +/- 202 2774 +/- 35 2592 +/- 84
Ant 2286 +/- 72 2443 +/- 89 2364 +/- 120 2782 +/- 76 3146 +/- 35 3102 +/- 37 3305 +/- 43 3345 +/- 39
Hopper 1627 +/- 158 1561 +/- 220 1567 +/- 339 2512 +/- 21 2422 +/- 168 2262 +/- 1 2429 +/- 126 2515 +/- 67
Walker2D 577 +/- 65 839 +/- 56 1230 +/- 147 2019 +/- 64 2184 +/- 54 2136 +/- 67 2063 +/- 185 1814 +/- 395

DDPG

Using TD3 hyperparameters as base with some minor adjustements (lr, batch_size) for stability.

6 seeds, 1M steps.

Environments DDPG
Gaussian
HalfCheetah 2272 +/- 69
Ant 1651 +/- 407
Hopper 1201 +/- 211
Walker2D 882 +/- 186

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:18 (8 by maintainers)

github_iconTop GitHub Comments

2reactions
araffincommented, Jun 7, 2021

I assume that for DDPG, TD3 and SAC, you are using the default parameters given in the documentation / paper.

actually, slightly different ones as I’m training on PyBullet envs (different from MuJoCo ones, used in the paper).

You have instructions in the doc 😉 I’m using the RL zoo: https://github.com/DLR-RM/rl-baselines3-zoo.

Instructions: https://stable-baselines3.readthedocs.io/en/master/modules/sac.html#how-to-replicate-the-results

Personally, I find these hyper-parameter differences unjustified from an algorithmic standpoint. Although I’m aware that this is an effort to match the respective original publications, these are pretty similar algorithms.

You are completely right. In fact, the original code of TD3 now shares SAC hyperparams (https://github.com/sfujim/TD3) And you can easily do that in the zoo.

Is it possible to test them with shared hyper-parameters? Also, just to double check, where can I find the hyper-parameters you used for this exact replication (gSDE paper or the current SB3 doc)?

yes you can (but you need to deactivate gSDE for SAC, as gSDE for TD3 is no longer supported).

Also, just to double check, where can I find the hyper-parameters you used for this exact replication (gSDE paper or the current SB3 doc)?

In the RL Zoo. You can even check the learning curves from the saved logs: https://github.com/DLR-RM/rl-trained-agents

1reaction
araffincommented, Jul 22, 2021

@axkoenig I believe these are (still) the correct instructions and settings to replicate things: https://stable-baselines3.readthedocs.io/en/master/modules/sac.html#how-to-replicate-the-results (@araffin please correct me if I am wrong).

Ok thanks, I was just wondering whether they were posted somewhere s.t. I don’t need to train the model myself until the end

yes, the RL Zoo is the place to go to replicate results. I saved the training/evaluation reward and the trained agent but not the rest of the metrics (although you can easily reproduce the run normally). Your issue is probably related to https://discord.com/channels/765294874832273419/767403892446593055/866702257499668492

Read more comments on GitHub >

github_iconTop Results From Across the Web

A Developer's Guide to Continuous Performance Testing -
Continuous performance testing is the continuous monitoring of the performance of an application under increased load. Teams can monitor and test ...
Read more >
The Ultimate Guide to Continuous Testing: Performance Testing
In this section of our guide, we'll take a look at how to do performance testing in a continuous testing scheme so users...
Read more >
What Is Continuous Performance Management? - Clear Review
Continuous Performance Management in a human resource (HR) management context is defined as performance management processes that take place throughout the year ...
Read more >
Best Practices for Continuous Performance Testing in DevOps
Implement a Test-Driven Approach; Performance SLAs Should Be the Focus; Use Dynamic Tests for Dynamic Environments; Automate Testing and Set ...
Read more >
Continuous Performance Management: 5 Tips for Success
An effective approach to continuous performance management is key to business success. Explore these 5 tips to shape your strategy.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found