Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

while using evalcallback with DummyVecEnv, env's reset() method will be call twice

See original GitHub issue

🐛 Bug && Reproduce

using evalcallback with DummyVecEnv, for example

eval_env = gym.make('Pendulum-v0')
eval_callback = EvalCallback(eval_env, best_model_save_path='./logs/',
                             log_path='./logs/', eval_freq=500,
                             deterministic=True, render=False)

model = SAC('MlpPolicy', 'Pendulum-v0')
model.learn(5000, callback=eval_callback)

when the env emit a done=True signale, I found that env’s reset() method will be call twice

the first time: episode is done (https://github.com/DLR-RM/stable-baselines3/blob/master/stable_baselines3/common/vec_env/dummy_vec_env.py#L49)

the second time: evaluate_policy call again (https://github.com/DLR-RM/stable-baselines3/blob/master/stable_baselines3/common/evaluation.py#L82)

so the problem is that if I have more than one episode event, the last episode done with reset(), obs will be set to the next episode’s, but when the next EvalCallBack call, evaluate_policy call reset() again, the result is that every time EvalCallBack is called, one episode event will be missed

Expected behavior

I think the right thing is that reset() call one time in each evalcallback triggered

Issue Analytics

State:
Created 2 years ago
Comments:6 (1 by maintainers)

Top GitHub Comments

1reaction

Miffylicommented, Jun 24, 2021

You are not looking at the master branch with that code above. Here is the up-to-date version with different logic: https://github.com/DLR-RM/stable-baselines3/blob/master/stable_baselines3/common/evaluation.py

We can not remove reset() in evaluate_policy() because evaluation is based on episodic reward, which requires running complete episodes.

1reaction

Miffylicommented, Jun 24, 2021

Hmm I having bit of hard time understanding the issue here. For evaluation, we need to call reset before runs to ensure we start from fresh episodes, and inside VecEnv we need to do this kind automatic resetting to ensure all envs are always step-able.

Could you try to rephrase the issue and/or demonstrate a concrete fix?