Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Feature Request] Can't render terminal step of episode

See original GitHub issue

🚀 Feature

Examining the step_wait method of `DummyVecEnv:

    def step_wait(self) -> VecEnvStepReturn:
        for env_idx in range(self.num_envs):
            obs, self.buf_rews[env_idx], self.buf_dones[env_idx], self.buf_infos[env_idx] = self.envs[env_idx].step(
                self.actions[env_idx]
            )
            if self.buf_dones[env_idx]:
                # save final observation where user can get it, then reset
                self.buf_infos[env_idx]["terminal_observation"] = obs
                obs = self.envs[env_idx].reset()
            self._save_obs(env_idx, obs)
        return (self._obs_from_buf(), np.copy(self.buf_rews), np.copy(self.buf_dones), deepcopy(self.buf_infos))

You can see that the environment immediately resets when self.buf_ones[env_idx] == True, without the user having the opportunity to render the last step of the episode. The same is true for SubprocVecEnv. Users should have some way to render this final step.

Motivation

For some environments, especially custom grid-worlds, it is critical to view the last step, in order to ensure that the termination conditions of the episode are correctly implemented.

Pitch

I propose adding a render and render_mode argument to the __init__ methods of these classes. When True, each step will be rendered automatically, including the last time-step. This feature is non-breaking and users can ignore it if they please.

Also, I’ve already implemented a pull request: https://github.com/DLR-RM/stable-baselines3/pull/280#issuecomment-754719673

Alternatives

Any alternative would require an awkward refactor of the step_wait methods, somehow exposing the underlying environment to the user when the episode completes, but before the environment resets. Perhaps the method could somehow, make a copy of the terminal environment available to the user, but this would require some kind of expensive duplication of the environment state. I am open to suggestions.

I have checked that there is no similar issue in the repo (required)

Issue Analytics

State:
Created 3 years ago
Comments:5 (2 by maintainers)

Top GitHub Comments

2reactions

ethanabrookscommented, Jan 5, 2021

this sounds like something that can be solved within the custom environment, by passing a keyword argument at creation time or setting a flag to true using set_attr() method.

I think this is a reasonable solution. I’ll close for now and let you know if I have any issues with that.

0reactions

araffincommented, Jan 18, 2021

I understand your hesitation to modify the API.

yes, every new features must be well motivated and be useful to numerous users. And if there are good alternatives (which is the case here I think), I would tend to prefer those as every feature add a bit of complexity and makes the project slightly harder to maintain 😉