Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Proposal] Remove autoreset logic from VectorEnv and rely on AutoResetWrapper instead

See original GitHub issue

Proposal

At the moment, both SyncVectorEnv and AsyncVectorEnv have the autoreset logic hardcoded in their step function and can’t be disabled. This means I cannot leverage VectorEnv to perform parallel evaluations. I don’t want the environments to autoreset, since I want to rollout a single episode for each sub environment.

In TextWorld, I opted to simply carry over the last state until all sub envs terminate. The same can be done with a simple wrapper (e.g., IgnoreDoneEnv below) if autoreset logic is moved outside *VectorEnv and the appropriate AutoResetWrapper is used instead.

class IgnoreDoneEnv(gym.Wrapper):

    def reset(self, **kwargs):
        observation, info = self.env.reset(**kwargs)
        self._last_state = None
        self._is_done = False
        return observation, info

    def step(self, action):
        if self._is_done:
            return self._last_state

        observation, reward, terminated, truncated, info = self.env.step(action)
        self._is_done = terminated or truncated

        self._last_state = (observation, reward, terminated, truncated, info)
        return observation, reward, terminated, truncated, info

Motivation

No response

Pitch

No response

Alternatives

No response

Additional context

No response

Checklist

I have checked that there is no similar issue in the repo

Issue Analytics

State:
Created 10 months ago
Comments:12 (6 by maintainers)

Top GitHub Comments

1reaction

RedTachyoncommented, Nov 9, 2022

I’ll try that.

There’s a good chance you can actually just do

gym.make_vec(..., vectorization_mode="async", wrappers=[IgnoreDoneEnv])

and it will just work the way you want it to work (after changing the wrapper as you described). In fact I’d definitely hope it does work, that’s how it’s designed.

1reaction

RedTachyoncommented, Nov 9, 2022

The code returns last_state which contains all the information returned by the last step call

Good point, I misread it. But note that it’s very likely that learning code would interpret that as a very long chain of one-step episodes, which is not great.

How’s the autoreset handled in the vectorised cartpole?

https://github.com/Farama-Foundation/Gymnasium/blob/59740038696150523ab7396cf8082e4558199f30/gymnasium/envs/classic_control/cartpole.py#L430-L435

The way I see, I’m not making any assumption about how the Envs should deal with action sent after termination signal was returned.

You’re assuming that it makes sense to call step after the environment terminates, before resetting it again. So notably it shouldn’t raise an exception, which would be a pretty reasonable thing to do. If we remove the autoreset, then you can have perfectly reasonable code (env that raises an exception) and on top of it you use a perfectly valid vectorization (AsyncVectorEnv or something), and now everything is crashing. Yes, you’re providing a wrapper that does it for Sync/Async vector envs, but we shouldn’t require people to use that, or to implement that specific functionality.

Also correct me if I’m wrong, but this IgnoreDoneEnv wrapper actually does exactly what you want, right? So for Sync/Async vector envs you can just do that, and other vectorization methods anyways can’t be expected to provide this functionality. So why should this be included in the core API, if a wrapper works just fine for your application?

Isn’t the purpose of having a AutoResetWrapper to have control on when I want this functionality

This is the case for individual environments where it’s much less critical – you can just do if terminated or truncated: env.reset() and you’re golden without autoreset.

With vectorized environments, you basically can’t train without autoreset, since you can only reset all the environments the same time, and not just a proper subset of them. If you reset after one env terminates, you lose the tails from all the other envs. If you reset after all envs terminate, then you lost a whole bunch of time computing actions for environments which have been terminated anyways. That’s why I’m still convinced that autoreset should be considered a core feature of the vector API, and not something optional. And for the rare edge cases where you want to customize it - just use a wrapper.