Environment is reset twice per episode when evaluating policy on DummyVecEnv
See original GitHub issueThe evaluate_policy
helper function reset the environment at the start of each episode:
But DummyVecEnv
automatically resets the environment when step
returns done = true
:
This causes the environment to reset twice per episode when evaluating the policy.
Issue Analytics
- State:
- Created 3 years ago
- Comments:10 (5 by maintainers)
Top Results From Across the Web
Vectorized Environments - Stable Baselines3 - Read the Docs
When using vectorized environments, the environments are automatically reset at the end of each episode. Thus, the observation returned for the i-th ...
Read more >Stable Baselines Documentation - Read the Docs
Evaluate the performance using a separate test environment ... the environments are automatically reset at the end of each episode.
Read more >Stable-Baselines3: Reliable Reinforcement Learning ...
We follow best practices for training and evaluation, such as evaluating in a separate environment, using deterministic evaluation where ...
Read more >Note for RL Stable Baselines | Super Agents of AI
The policy class to use will be inferred and the environment will be ... per episode self.current_step = 0 def reset(self): """ Reset...
Read more >Let's train our first Deep Reinforcement Learning agent ...
If the episode is done: We reset the environment to its initial state with observation = env.reset(). Let's look at an example! Make...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
It shall reset all envs, you have the
env_method()
for something more granular. We have to keep in mind that this feature will be used in special cases only and the current behavior work in most cases, so I would avoid overcomplicated things.I don’t like changing the api of
step()
😕 (which should mimic the gym api) even though I understand your point.