Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Question] Problem to understand the effect of the parameter "reset_num_timesteps"

See original GitHub issue

Hi all,

I just started to use stable_baselines3 and I have difficulties understanding the effect of the parameter reset_num_timesteps in the learning statement of the model. Here is the code that I use:

from stable_baselines3 import A2C
import os
import tensorflow as tf
from datetime import datetime

#DSM_BT1_Env is a custom OpenAI gym environment
env = DSM_BT1_Env()
models_dir = "models/A2C"
logdir = "logs/RL_BT1"


if not os.path.exists(models_dir):
    os.makedirs(models_dir)
    
if not os.path.exists(logdir):
    os.makedirs(logdir)
    

model = A2C('MlpPolicy', env, verbose=1, tensorboard_log=logdir)

timesteps = 10000


#train and save the model
numberOfEpisodes = 5 
for i in range(numberOfEpisodes):
    timeStarted = datetime.now().strftime("%d-%m-%Y--%H-%M-%S")
    model.learn(total_timesteps=timesteps, reset_num_timesteps=False, tb_log_name=f"A2C_Started_{timeStarted}_Episode_{i+1}_Timesteps_{timesteps}")
    model.save(f"{models_dir}/Started_{timeStarted}_Episode_{i+1}_Timesteps_{timesteps}")

I run 5 different episoded of the algorithm A2C and each of them should train for timesteps = 10000. I tried it once with reset_num_timesteps=True and once with reset_num_timesteps=False. Here you can see the outcomes of the mean reward and the episode lengths as screenshots from Tensorboard: Screenshots_Tensorboard

When reset_num_timesteps=False it seems that each of the 5 epsiodes are just the continuations of the previous ones. So can you just regard it as one single episode because the different episodes are not independant from each other whereas when reset_num_timesteps=True it looks like there are 5 independant episodes and the training always starts from the beginning. Is this correct? Or how else could you interpret these results?

Issue Analytics

State:
Created 2 years ago
Comments:7 (2 by maintainers)

Top GitHub Comments

2reactions

Miffylicommented, Mar 15, 2022

Yes, that is expected. reset_num_timesteps=True only resets the timesteps counter (i.e. how long the agent has been trained), but the model parameters are left as is. If you want to start training from scratch, create a new agent model.

2reactions

Miffylicommented, Feb 23, 2022

Yup, that is the correct and expected behaviour 😃. reset_num_timesteps sets the number of timesteps trained to zero, which is useful logging (depending on your situation) or if you want to set a new learning rate. Using it plots all learning episodes as overlapping, separate lines like in your first image (which is correct/expected). Btw I like the smoothness and how well lines connect in the second image 😃.