question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Question] Problem to understand the effect of the parameter "reset_num_timesteps"

See original GitHub issue

Hi all,

I just started to use stable_baselines3 and I have difficulties understanding the effect of the parameter reset_num_timesteps in the learning statement of the model. Here is the code that I use:

from stable_baselines3 import A2C
import os
import tensorflow as tf
from datetime import datetime

#DSM_BT1_Env is a custom OpenAI gym environment
env = DSM_BT1_Env()
models_dir = "models/A2C"
logdir = "logs/RL_BT1"


if not os.path.exists(models_dir):
    os.makedirs(models_dir)
    
if not os.path.exists(logdir):
    os.makedirs(logdir)
    

model = A2C('MlpPolicy', env, verbose=1, tensorboard_log=logdir)

timesteps = 10000


#train and save the model
numberOfEpisodes = 5 
for i in range(numberOfEpisodes):
    timeStarted = datetime.now().strftime("%d-%m-%Y--%H-%M-%S")
    model.learn(total_timesteps=timesteps, reset_num_timesteps=False, tb_log_name=f"A2C_Started_{timeStarted}_Episode_{i+1}_Timesteps_{timesteps}")
    model.save(f"{models_dir}/Started_{timeStarted}_Episode_{i+1}_Timesteps_{timesteps}")

I run 5 different episoded of the algorithm A2C and each of them should train for timesteps = 10000. I tried it once with reset_num_timesteps=True and once with reset_num_timesteps=False. Here you can see the outcomes of the mean reward and the episode lengths as screenshots from Tensorboard: Screenshots_Tensorboard

When reset_num_timesteps=False it seems that each of the 5 epsiodes are just the continuations of the previous ones. So can you just regard it as one single episode because the different episodes are not independant from each other whereas when reset_num_timesteps=True it looks like there are 5 independant episodes and the training always starts from the beginning. Is this correct? Or how else could you interpret these results?

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:7 (2 by maintainers)

github_iconTop GitHub Comments

2reactions
Miffylicommented, Mar 15, 2022

Yes, that is expected. reset_num_timesteps=True only resets the timesteps counter (i.e. how long the agent has been trained), but the model parameters are left as is. If you want to start training from scratch, create a new agent model.

2reactions
Miffylicommented, Feb 23, 2022

Yup, that is the correct and expected behaviour 😃. reset_num_timesteps sets the number of timesteps trained to zero, which is useful logging (depending on your situation) or if you want to set a new learning rate. Using it plots all learning episodes as overlapping, separate lines like in your first image (which is correct/expected). Btw I like the smoothness and how well lines connect in the second image 😃.

Read more comments on GitHub >

github_iconTop Results From Across the Web

No results found

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found