Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

steps_per_epoch in DDPG.

See original GitHub issue

Hi, I saw in openai spinups

spinup.ddpg_tf1(..., steps_per_epoch=4000, epochs=100, ...)

which specifies the number of steps in each episode/epoch. Is there a similar setting in stable_baselines? Thanks!

Issue Analytics

State:
Created 3 years ago
Comments:8

Top GitHub Comments

1reaction

PartiallyTypedcommented, Mar 31, 2020

T usually, and in this case, signifies the end of the episode. So the action selection, storing, network optimisation and target update occurs once per environment step. So when the episode has finished, the noise and the environment are reset. This is done here:

https://github.com/hill-a/stable-baselines/blob/950c2a5bf95a9fa908be26fd5db11aa60cfa2b2a/stable_baselines/ddpg/ddpg.py#L831-L847

and here:

https://github.com/hill-a/stable-baselines/blob/950c2a5bf95a9fa908be26fd5db11aa60cfa2b2a/stable_baselines/ddpg/ddpg.py#L934-L951

0reactions

blurLakecommented, Apr 1, 2020

Alright, thanks a lot @Solliet @Miffyli @araffin . I will try with TD3 and SAC.

Top Results From Across the Web

garage.tf.algos.ddpg module - Read the Docs

Deep Deterministic Policy Gradient (DDPG) implementation in TensorFlow. class DDPG (env_spec, policy, qf, replay_buffer, *, steps_per_epoch=20, ...

Deep Deterministic Policy Gradient - Spinning Up in Deep RL!

Deep Deterministic Policy Gradient (DDPG) is an algorithm which concurrently learns a Q-function and a policy. It uses off-policy data and the Bellman ......

deeprl.agents.DDPG Example - Program Talk

DDPG. Learn how to use python api deeprl.agents.DDPG. ... def ddpg(output_dir, seed, env_name='Swimmer-v2', hidden_sizes=(400, 300), steps_per_epoch=5000, ...

Proximal Policy Optimization - Keras

Hyperparameters of the PPO algorithm steps_per_epoch = 4000 epochs = 30 gamma = 0.99 clip_ratio = 0.2 policy_learning_rate = 3e-4 ...

Implementing Spinningup Pytorch DDPG for Cartpole-v0 ...

I am trying to implement DDPG for the cartpole problem from here: ... MLPActorCritic, ac_kwargs=dict(), seed=0, steps_per_epoch=4000, ...