Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Learn with number of episodes rather than total_timesteps

See original GitHub issue

Hi,

I would like to find a way to define the .learn method in PPO1 (and I guess other agents) to stop after a given number of episodes e.g .learn(nr_episodes) rather than explicitly defining a number of steps. This could be useful in situations where different episodes have different lengths which cannot be determined exactly beforehands.

As a quick hack, I did some changes in pposgd_simple.py. Added a new default argument in :

.learn(..., total_episodes=None)

And then replaced

if total_timesteps and timesteps_so_far >= total_timesteps:
    break

with

if total_episodes and episodes_so_far >= total_episodes:
    break

Finally, I was planning to just call

.learn(None, total_episodes=nr_episodes)

But then noticed this line

elif self.schedule == 'linear':
    cur_lrmult = max(1.0 - float(timesteps_so_far) / total_timesteps, 0)

So I’ll probably do a rough estimation of the timesteps and set total_timesteps accordingly or alternatively, change the schedule line to

elif self.schedule == 'linear':
    cur_lrmult = max(1.0 - float(episodes_so_far) / total_episodes, 0)

But I was wondering, do any of these modifications have consequences I’m not considering?

Issue Analytics

State:
Created 4 years ago
Reactions:3
Comments:8

Top GitHub Comments

4reactions

xicocaiocommented, Aug 13, 2020

Hi @Miffyli,

What I understand from the mentioned answer is quite the opposite of wasting. I think it will miss scanning some data points in equal amounts.

About the async, ok, that makes sense.

Still, for this callback approach, I would have to pass a total_timesteps variable that is high enough so that I can have the desired number of episodes. This callback approach seems like an out of the way workaround.

As I see that you are also a contributor in V3, can I expect that at some point passing a number of episodes instead of total_timesteps for model.learn() can be implemented, instead of having to rely on a callback?

Should I open an issue in that repo if that is a feature that I would like so see?

Thank you

3reactions

xicocaiocommented, Aug 13, 2020

Hello,

I think this is a duplicate (in a way) of #62 . What you suggest (having a maximum number of episode) will work only in two cases:

when you have one environment

when the number of timesteps per episode is fixed

in all the other case, there is not proper way of training for n episodes. I think we should for V3 (cf #576 ) we should have the callback called at every step of the environment even when we have multiple environment. That way, using a callback you could monitor the number of episode and exit when the desired number is reached.

it is possible to do that with the current version of SB (cf doc for how to use callback to stop training early) but it won’t be accurate (you will get a small difference between the desired number of episodes and the actual number).

EDIT: for a more extensive explanation on how to use callbacks, please take a look at our recent tutorial: jnrr19

Hi @araffin ,

Even though this is closed, and maybe there is something I am not getting, I would like to make my case for this issue.

For the particular case where the number of timesteps per episode is known and fixed is quite common for stock trading envs. Also, for stock trading scenarios, it can be quite valuable to scan all data points an equal amount of time thoroughly.
I do not think this is not that similar to issue #62 , and also I am not sure about the impact of using callbacks to identify the end of episodes.
Also, not necessarily, we want to monitor anything. In particular, it is just more convenient and error-prone to use episode count instead of time steps count.

For now, I am counting the number of data points I have in my price time series, and multiplying it for the number of episodes I want my model to experience during learning.

Alternatively, I am also considering the use of the SubprocVecEnv approach where the num_envs variable could be equivalent to my number of episodes and then set the total_timesteps to the amount of time points in my training sample.

All things considered, I think it would be quite useful to have an option to set a specific number of episodes when calling the learn() function.

If there is something wrong with my reasoning, or if you have any suggestions, please feel welcomed to point out.

Thanks in advance for your time. =)

Top Results From Across the Web

Understanding the total_timesteps parameter in stable ...

According to the stable-baselines source code. total_timesteps is the number of steps in total the agent will do for any environment.

Reinforcement Learning in Python with Stable Baselines 3

This allows us to see the actual total number of timesteps for the model rather than resetting every iteration. We're also setting a...

Examples — Stable Baselines 2.10.3a0 documentation

Learning curve of DDPG on LunarLanderContinuous environment ... 'timesteps') if len(x) > 0: # Mean training reward over the last 100 episodes mean_reward ......

Stable Baselines Documentation - Read the Docs

Take a look at the Vectorized Environments to learn more about training with ... CartPole-v1 (easy to be better than random agent, ...

19. Reinforcement Learning using Stable Baselines

At the beginning of each episode (i.e.: on reset) the goal cube is randomized. ... tensorboard_log=log_dir, ) model.learn(total_timesteps=total_timesteps, ...