[question] [Proposal] Maximum Iterations Per Episode
See original GitHub issueIn this Issue, I identify a common use-case not currently addressed, propose a solution (adding “maximum timesteps per episode” argument to learn), and offer to implement the change if the community is open to it.
Sometimes we have environments where the agent can get stuck, and the episode does not end. We want to abort the episode prematurely, reset, and continue training. When we train, one way to account for this is to have a maximum number of timesteps per episode in training. When this maximum is hit, the environment is reset even if the episode is not done.
It’s possible that this functionality already exists in the repository and I just missed it, but I looked through the documentation and the code itself and did not find it.
I propose adding a max_timesteps_per_episode argument to the learn methods.
.learn(total_timesteps, max_timesteps_per_episode=None, ....)
.learn(total_timesteps, max_timesteps_per_episode=2000, ....)
When max_timesteps_per_episode is set to None (default), behavior is as it currently is.
When max_timesteps_per_episode is set to a positive integer, then the following should occur: after an environment is reset, and then after this integer number of timesteps on this environment, the environment will be reset again even if the state is not done.
In the case of multiple environments, max timesteps per episode is of course per-environment.
If others like this proposal, I’m happy to implement and submit a PR for it. (At least for models that use Runner.)
Issue Analytics
- State:
- Created 4 years ago
- Comments:6
Top GitHub Comments
@CeBrendel Yup, you are right!
TimeLimit
wrapper adds an boolean that tells episode was truncated, and the next iteration of stable-baselines handles this info…Aaaand ninja’d by @araffin 😃. See his link. We also recommend moving to stable-baselines3.
@CeBrendel you should probably take a look at https://github.com/DLR-RM/stable-baselines3/issues/633