question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[question] [Proposal] Maximum Iterations Per Episode

See original GitHub issue

In this Issue, I identify a common use-case not currently addressed, propose a solution (adding “maximum timesteps per episode” argument to learn), and offer to implement the change if the community is open to it.

Sometimes we have environments where the agent can get stuck, and the episode does not end. We want to abort the episode prematurely, reset, and continue training. When we train, one way to account for this is to have a maximum number of timesteps per episode in training. When this maximum is hit, the environment is reset even if the episode is not done.

It’s possible that this functionality already exists in the repository and I just missed it, but I looked through the documentation and the code itself and did not find it.

I propose adding a max_timesteps_per_episode argument to the learn methods.

.learn(total_timesteps, max_timesteps_per_episode=None, ....)
.learn(total_timesteps, max_timesteps_per_episode=2000, ....)

When max_timesteps_per_episode is set to None (default), behavior is as it currently is.

When max_timesteps_per_episode is set to a positive integer, then the following should occur: after an environment is reset, and then after this integer number of timesteps on this environment, the environment will be reset again even if the state is not done.

In the case of multiple environments, max timesteps per episode is of course per-environment.

If others like this proposal, I’m happy to implement and submit a PR for it. (At least for models that use Runner.)

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:6

github_iconTop GitHub Comments

1reaction
Miffylicommented, Mar 20, 2022

@CeBrendel Yup, you are right! TimeLimit wrapper adds an boolean that tells episode was truncated, and the next iteration of stable-baselines handles this info

Aaaand ninja’d by @araffin 😃. See his link. We also recommend moving to stable-baselines3.

1reaction
araffincommented, Mar 20, 2022
Read more comments on GitHub >

github_iconTop Results From Across the Web

5 Questions of Time: Echoes, Iterations - Oxford Academic
It explores echoic effects, iteration, flow, and timing with the help of literary artefacts (by Alice Oswald, Tennyson, and others) that both embody...
Read more >
Configuring Iterations For Teams In Azure Devops
You configure iterations at the project level (Project Settings > Boards > Project configuration). However, a Team chooses which iterations it ...
Read more >
(PDF) Fitted Q-iteration by Advantage Weighted Regression
In this paper, we show that by using a soft-greedy action selection the policy improvement step used in FQI can be simplified to...
Read more >
qt25g6573w.pdf - eScholarship.org
In this thesis, we study how maximum entropy framework can provide efficient deep rein- forcement learning (deep RL) algorithms that solve tasks ...
Read more >
Dellas_FinalREUPosterPDF.pdf
Iteration Dynamic Programming for Reinforcement Learning ... tackle the classical DP-RL problem of finding the best policy iteration strategy by exploring.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found