question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

while using evalcallback with DummyVecEnv, env's reset() method will be call twice

See original GitHub issue

🐛 Bug && Reproduce

using evalcallback with DummyVecEnv, for example

eval_env = gym.make('Pendulum-v0')
eval_callback = EvalCallback(eval_env, best_model_save_path='./logs/',
                             log_path='./logs/', eval_freq=500,
                             deterministic=True, render=False)

model = SAC('MlpPolicy', 'Pendulum-v0')
model.learn(5000, callback=eval_callback)

when the env emit a done=True signale, I found that env’s reset() method will be call twice

the first time: episode is done (https://github.com/DLR-RM/stable-baselines3/blob/master/stable_baselines3/common/vec_env/dummy_vec_env.py#L49)

the second time: evaluate_policy call again (https://github.com/DLR-RM/stable-baselines3/blob/master/stable_baselines3/common/evaluation.py#L82)

so the problem is that if I have more than one episode event, the last episode done with reset(), obs will be set to the next episode’s, but when the next EvalCallBack call, evaluate_policy call reset() again, the result is that every time EvalCallBack is called, one episode event will be missed

Expected behavior

I think the right thing is that reset() call one time in each evalcallback triggered

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:6 (1 by maintainers)

github_iconTop GitHub Comments

1reaction
Miffylicommented, Jun 24, 2021

You are not looking at the master branch with that code above. Here is the up-to-date version with different logic: https://github.com/DLR-RM/stable-baselines3/blob/master/stable_baselines3/common/evaluation.py

We can not remove reset() in evaluate_policy() because evaluation is based on episodic reward, which requires running complete episodes.

1reaction
Miffylicommented, Jun 24, 2021

Hmm I having bit of hard time understanding the issue here. For evaluation, we need to call reset before runs to ensure we start from fresh episodes, and inside VecEnv we need to do this kind automatic resetting to ensure all envs are always step-able.

Could you try to rephrase the issue and/or demonstrate a concrete fix?

Read more comments on GitHub >

github_iconTop Results From Across the Web

master PDF - Stable Baselines3 Documentation
calling the .predict() method, this frequently leads to better performance. ... reward (in practice, we recommend using ``EvalCallback``).
Read more >
Stable Baselines Documentation - Read the Docs
Stable Baselines is a set of improved implementations of Reinforcement Learning (RL) algorithms based on OpenAI. Baselines. Github repository: ...
Read more >
stable_baselines.common.vec_env.DummyVecEnv()
This page shows Python examples of stable_baselines.common.vec_env.DummyVecEnv.
Read more >
Note for RL Stable Baselines | Super Agents of AI
Stable-Baselines provides two types of Vectorized Environment: SubprocVecEnv which run each environment in a separate process; DummyVecEnv which ...
Read more >
Typerror with int() argument while training RL agent using ...
I am trying to train an RL agent with my custom environment using ... If I saw correctly, it is after calling the...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found