[Question] Problem to understand the effect of the parameter "reset_num_timesteps"
See original GitHub issueHi all,
I just started to use stable_baselines3 and I have difficulties understanding the effect of the parameter reset_num_timesteps
in the learning statement of the model. Here is the code that I use:
from stable_baselines3 import A2C
import os
import tensorflow as tf
from datetime import datetime
#DSM_BT1_Env is a custom OpenAI gym environment
env = DSM_BT1_Env()
models_dir = "models/A2C"
logdir = "logs/RL_BT1"
if not os.path.exists(models_dir):
os.makedirs(models_dir)
if not os.path.exists(logdir):
os.makedirs(logdir)
model = A2C('MlpPolicy', env, verbose=1, tensorboard_log=logdir)
timesteps = 10000
#train and save the model
numberOfEpisodes = 5
for i in range(numberOfEpisodes):
timeStarted = datetime.now().strftime("%d-%m-%Y--%H-%M-%S")
model.learn(total_timesteps=timesteps, reset_num_timesteps=False, tb_log_name=f"A2C_Started_{timeStarted}_Episode_{i+1}_Timesteps_{timesteps}")
model.save(f"{models_dir}/Started_{timeStarted}_Episode_{i+1}_Timesteps_{timesteps}")
I run 5 different episoded of the algorithm A2C and each of them should train for timesteps = 10000
. I tried it once with reset_num_timesteps=True
and once with reset_num_timesteps=False
. Here you can see the outcomes of the mean reward and the episode lengths as screenshots from Tensorboard:
When reset_num_timesteps=False
it seems that each of the 5 epsiodes are just the continuations of the previous ones. So can you just regard it as one single episode because the different episodes are not independant from each other whereas when reset_num_timesteps=True
it looks like there are 5 independant episodes and the training always starts from the beginning. Is this correct? Or how else could you interpret these results?
Issue Analytics
- State:
- Created 2 years ago
- Comments:7 (2 by maintainers)
Top GitHub Comments
Yes, that is expected.
reset_num_timesteps=True
only resets thetimesteps
counter (i.e. how long the agent has been trained), but the model parameters are left as is. If you want to start training from scratch, create a new agent model.Yup, that is the correct and expected behaviour 😃.
reset_num_timesteps
sets the number of timesteps trained to zero, which is useful logging (depending on your situation) or if you want to set a new learning rate. Using it plots all learning episodes as overlapping, separate lines like in your first image (which is correct/expected). Btw I like the smoothness and how well lines connect in the second image 😃.