while using evalcallback with DummyVecEnv, env's reset() method will be call twice
See original GitHub issue🐛 Bug && Reproduce
using evalcallback with DummyVecEnv, for example
eval_env = gym.make('Pendulum-v0')
eval_callback = EvalCallback(eval_env, best_model_save_path='./logs/',
log_path='./logs/', eval_freq=500,
deterministic=True, render=False)
model = SAC('MlpPolicy', 'Pendulum-v0')
model.learn(5000, callback=eval_callback)
when the env emit a done=True
signale, I found that env’s reset() method will be call twice
the first time: episode is done (https://github.com/DLR-RM/stable-baselines3/blob/master/stable_baselines3/common/vec_env/dummy_vec_env.py#L49)
the second time: evaluate_policy call again (https://github.com/DLR-RM/stable-baselines3/blob/master/stable_baselines3/common/evaluation.py#L82)
so the problem is that if I have more than one episode event, the last episode done with reset(), obs will be set to the next episode’s, but when the next EvalCallBack call, evaluate_policy call reset() again, the result is that every time EvalCallBack is called, one episode event will be missed
Expected behavior
I think the right thing is that reset() call one time in each evalcallback triggered
Issue Analytics
- State:
- Created 2 years ago
- Comments:6 (1 by maintainers)
Top GitHub Comments
You are not looking at the master branch with that code above. Here is the up-to-date version with different logic: https://github.com/DLR-RM/stable-baselines3/blob/master/stable_baselines3/common/evaluation.py
We can not remove
reset()
inevaluate_policy()
because evaluation is based on episodic reward, which requires running complete episodes.Hmm I having bit of hard time understanding the issue here. For evaluation, we need to call
reset
before runs to ensure we start from fresh episodes, and insideVecEnv
we need to do this kind automatic resetting to ensure all envs are always step-able.Could you try to rephrase the issue and/or demonstrate a concrete fix?