[Proposal] Remove autoreset logic from VectorEnv and rely on AutoResetWrapper insteadSee original GitHub issue
At the moment, both
AsyncVectorEnv have the autoreset logic hardcoded in their step function and can’t be disabled. This means I cannot leverage
VectorEnv to perform parallel evaluations. I don’t want the environments to autoreset, since I want to rollout a single episode for each sub environment.
In TextWorld, I opted to simply carry over the last state until all sub envs terminate. The same can be done with a simple wrapper (e.g.,
IgnoreDoneEnv below) if autoreset logic is moved outside
*VectorEnv and the appropriate
AutoResetWrapper is used instead.
class IgnoreDoneEnv(gym.Wrapper): def reset(self, **kwargs): observation, info = self.env.reset(**kwargs) self._last_state = None self._is_done = False return observation, info def step(self, action): if self._is_done: return self._last_state observation, reward, terminated, truncated, info = self.env.step(action) self._is_done = terminated or truncated self._last_state = (observation, reward, terminated, truncated, info) return observation, reward, terminated, truncated, info
- I have checked that there is no similar issue in the repo
- Created 10 months ago
- Comments:12 (6 by maintainers)
Top GitHub Comments
I’ll try that.
There’s a good chance you can actually just do
gym.make_vec(..., vectorization_mode="async", wrappers=[IgnoreDoneEnv])
and it will just work the way you want it to work (after changing the wrapper as you described). In fact I’d definitely hope it does work, that’s how it’s designed.
The code returns last_state which contains all the information returned by the last step call
Good point, I misread it. But note that it’s very likely that learning code would interpret that as a very long chain of one-step episodes, which is not great.
How’s the autoreset handled in the vectorised cartpole?
The way I see, I’m not making any assumption about how the Envs should deal with action sent after termination signal was returned.
You’re assuming that it makes sense to call
step after the environment terminates, before resetting it again. So notably it shouldn’t raise an exception, which would be a pretty reasonable thing to do. If we remove the autoreset, then you can have perfectly reasonable code (env that raises an exception) and on top of it you use a perfectly valid vectorization (AsyncVectorEnv or something), and now everything is crashing. Yes, you’re providing a wrapper that does it for Sync/Async vector envs, but we shouldn’t require people to use that, or to implement that specific functionality.
Also correct me if I’m wrong, but this
IgnoreDoneEnv wrapper actually does exactly what you want, right? So for Sync/Async vector envs you can just do that, and other vectorization methods anyways can’t be expected to provide this functionality. So why should this be included in the core API, if a wrapper works just fine for your application?
Isn’t the purpose of having a AutoResetWrapper to have control on when I want this functionality
This is the case for individual environments where it’s much less critical – you can just do
if terminated or truncated: env.reset() and you’re golden without autoreset.
With vectorized environments, you basically can’t train without autoreset, since you can only reset all the environments the same time, and not just a proper subset of them. If you reset after one env terminates, you lose the tails from all the other envs. If you reset after all envs terminate, then you lost a whole bunch of time computing actions for environments which have been terminated anyways. That’s why I’m still convinced that autoreset should be considered a core feature of the vector API, and not something optional. And for the rare edge cases where you want to customize it - just use a wrapper.