question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

steps_per_epoch in DDPG.

See original GitHub issue

Hi, I saw in openai spinups

spinup.ddpg_tf1(..., steps_per_epoch=4000, epochs=100, ...)

which specifies the number of steps in each episode/epoch. Is there a similar setting in stable_baselines? Thanks!

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:8

github_iconTop GitHub Comments

1reaction
PartiallyTypedcommented, Mar 31, 2020

T usually, and in this case, signifies the end of the episode. So the action selection, storing, network optimisation and target update occurs once per environment step. So when the episode has finished, the noise and the environment are reset. This is done here:

https://github.com/hill-a/stable-baselines/blob/950c2a5bf95a9fa908be26fd5db11aa60cfa2b2a/stable_baselines/ddpg/ddpg.py#L831-L847

and here:

https://github.com/hill-a/stable-baselines/blob/950c2a5bf95a9fa908be26fd5db11aa60cfa2b2a/stable_baselines/ddpg/ddpg.py#L934-L951

0reactions
blurLakecommented, Apr 1, 2020

Alright, thanks a lot @Solliet @Miffyli @araffin . I will try with TD3 and SAC.

Read more comments on GitHub >

github_iconTop Results From Across the Web

garage.tf.algos.ddpg module - Read the Docs
Deep Deterministic Policy Gradient (DDPG) implementation in TensorFlow. class DDPG (env_spec, policy, qf, replay_buffer, *, steps_per_epoch=20, ...
Read more >
Deep Deterministic Policy Gradient - Spinning Up in Deep RL!
Deep Deterministic Policy Gradient (DDPG) is an algorithm which concurrently learns a Q-function and a policy. It uses off-policy data and the Bellman ......
Read more >
deeprl.agents.DDPG Example - Program Talk
DDPG. Learn how to use python api deeprl.agents.DDPG. ... def ddpg(output_dir, seed, env_name='Swimmer-v2', hidden_sizes=(400, 300), steps_per_epoch=5000, ...
Read more >
Proximal Policy Optimization - Keras
Hyperparameters of the PPO algorithm steps_per_epoch = 4000 epochs = 30 gamma = 0.99 clip_ratio = 0.2 policy_learning_rate = 3e-4 ...
Read more >
Implementing Spinningup Pytorch DDPG for Cartpole-v0 ...
I am trying to implement DDPG for the cartpole problem from here: ... MLPActorCritic, ac_kwargs=dict(), seed=0, steps_per_epoch=4000, ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found