[Proposal] Remove autoreset logic from VectorEnv and rely on AutoResetWrapper instead
See original GitHub issueProposal
At the moment, both SyncVectorEnv
and AsyncVectorEnv
have the autoreset logic hardcoded in their step function and can’t be disabled. This means I cannot leverage VectorEnv
to perform parallel evaluations. I don’t want the environments to autoreset, since I want to rollout a single episode for each sub environment.
In TextWorld, I opted to simply carry over the last state until all sub envs terminate. The same can be done with a simple wrapper (e.g., IgnoreDoneEnv
below) if autoreset logic is moved outside *VectorEnv
and the appropriate AutoResetWrapper
is used instead.
class IgnoreDoneEnv(gym.Wrapper):
def reset(self, **kwargs):
observation, info = self.env.reset(**kwargs)
self._last_state = None
self._is_done = False
return observation, info
def step(self, action):
if self._is_done:
return self._last_state
observation, reward, terminated, truncated, info = self.env.step(action)
self._is_done = terminated or truncated
self._last_state = (observation, reward, terminated, truncated, info)
return observation, reward, terminated, truncated, info
Motivation
No response
Pitch
No response
Alternatives
No response
Additional context
No response
Checklist
- I have checked that there is no similar issue in the repo
Issue Analytics
- State:
- Created 10 months ago
- Comments:12 (6 by maintainers)
Top Results From Across the Web
[Proposal] Autoreset when stepping in done state #2564
The way I'd see it implemented is having a guard in the step code, after performing the step logic: if done: obs, info...
Read more >A standard API for reinforcement learning and a diverse set ...
[Proposal] Remove autoreset logic from VectorEnv and rely on ... outside *VectorEnv and the appropriate AutoResetWrapper is used instead.
Read more >Farama-Foundation Gymnasium Issues
[Proposal] Remove autoreset logic from VectorEnv and rely on AutoResetWrapper instead, closed, 12, 2022-11-09, 2022-12-13.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
There’s a good chance you can actually just do
and it will just work the way you want it to work (after changing the wrapper as you described). In fact I’d definitely hope it does work, that’s how it’s designed.
Good point, I misread it. But note that it’s very likely that learning code would interpret that as a very long chain of one-step episodes, which is not great.
https://github.com/Farama-Foundation/Gymnasium/blob/59740038696150523ab7396cf8082e4558199f30/gymnasium/envs/classic_control/cartpole.py#L430-L435
You’re assuming that it makes sense to call
step
after the environment terminates, before resetting it again. So notably it shouldn’t raise an exception, which would be a pretty reasonable thing to do. If we remove the autoreset, then you can have perfectly reasonable code (env that raises an exception) and on top of it you use a perfectly valid vectorization (AsyncVectorEnv or something), and now everything is crashing. Yes, you’re providing a wrapper that does it for Sync/Async vector envs, but we shouldn’t require people to use that, or to implement that specific functionality.Also correct me if I’m wrong, but this
IgnoreDoneEnv
wrapper actually does exactly what you want, right? So for Sync/Async vector envs you can just do that, and other vectorization methods anyways can’t be expected to provide this functionality. So why should this be included in the core API, if a wrapper works just fine for your application?This is the case for individual environments where it’s much less critical – you can just do
if terminated or truncated: env.reset()
and you’re golden without autoreset.With vectorized environments, you basically can’t train without autoreset, since you can only reset all the environments the same time, and not just a proper subset of them. If you reset after one env terminates, you lose the tails from all the other envs. If you reset after all envs terminate, then you lost a whole bunch of time computing actions for environments which have been terminated anyways. That’s why I’m still convinced that autoreset should be considered a core feature of the vector API, and not something optional. And for the rare edge cases where you want to customize it - just use a wrapper.