[feature request] Add total episodes parameter to model learn method
See original GitHub issueHi,
TL;DR: I would like the option to pass either a total_episodes
parameter or a total_timesteps
to the model.learn()
method.
Now, for my reasoning. Currently, we can only define the total_timesteps
when training an agent, as follows.
model = A2C('MlpLstmPolicy', env, verbose=1, policy_kwargs=policy_kwargs)
model.learn(total_timesteps=1000)
However, for some scenarios (e.g., stock trading), it is quite common to have a fixed number of timesteps per episode, given by the available time-series data points. Also, it can be quite valuable to scan all data points an equal amount of time thoroughly and to determine the number of passes, which is defined by the number of episodes.
Thus, to train for a given number of episodes for a fixed number of timesteps, I have to get the total_timesteps
value, before passing it to method model.learn()
as follows:
desired_total_episodes = 100
n_points = train_df.shape[0]) # get the number of data points
total_timesteps = desired_total_episodes * n_points
Even so, this answer on StackOverflow says that
Where the episode length is known, set it to the desired number of episode you would like to train. However, it might be less because the agent might not (probably wont) reach max steps every time.
I must admit I do not know how accurate this answer, but this worries me that my model may not scan all the data equally.
Another option, as discussed in this issue from previous stable baseline repo, is to use a callback function. Still, for this callback approach, I would have to pass a total_timesteps
variable that is high enough so that I can have the desired number of episodes. Hence, this callback approach seems like an out of the way workaround.
In conclusion, I believe that including the option to pass a total_episodes
could be a simple and effective approach that would broaden the number of use cases attended by this project.
Thank you for your attention!
Issue Analytics
- State:
- Created 3 years ago
- Reactions:5
- Comments:21 (13 by maintainers)
Top GitHub Comments
#115 is now merged with master 😉
I altered my code and now it encompasses the case of multiple envs using the approach discussed here. I also included a test case for this particular situation. Everything is good to go and I followed all instructions on contributing and the PR template.
Still, until #115 merged, I will probably wait to make the PR of the changes on my fork.