[Feature Request] Can't render terminal step of episode
See original GitHub issue🚀 Feature
Examining the step_wait
method of `DummyVecEnv:
def step_wait(self) -> VecEnvStepReturn:
for env_idx in range(self.num_envs):
obs, self.buf_rews[env_idx], self.buf_dones[env_idx], self.buf_infos[env_idx] = self.envs[env_idx].step(
self.actions[env_idx]
)
if self.buf_dones[env_idx]:
# save final observation where user can get it, then reset
self.buf_infos[env_idx]["terminal_observation"] = obs
obs = self.envs[env_idx].reset()
self._save_obs(env_idx, obs)
return (self._obs_from_buf(), np.copy(self.buf_rews), np.copy(self.buf_dones), deepcopy(self.buf_infos))
You can see that the environment immediately resets when self.buf_ones[env_idx] == True
, without the user having the opportunity to render the last step of the episode. The same is true for SubprocVecEnv
. Users should have some way to render this final step.
Motivation
For some environments, especially custom grid-worlds, it is critical to view the last step, in order to ensure that the termination conditions of the episode are correctly implemented.
Pitch
I propose adding a render
and render_mode
argument to the __init__
methods of these classes. When True
, each step will be rendered automatically, including the last time-step. This feature is non-breaking and users can ignore it if they please.
Also, I’ve already implemented a pull request: https://github.com/DLR-RM/stable-baselines3/pull/280#issuecomment-754719673
Alternatives
Any alternative would require an awkward refactor of the step_wait
methods, somehow exposing the underlying environment to the user when the episode completes, but before the environment resets. Perhaps the method could somehow, make a copy of the terminal environment available to the user, but this would require some kind of expensive duplication of the environment state. I am open to suggestions.
- I have checked that there is no similar issue in the repo (required)
Issue Analytics
- State:
- Created 3 years ago
- Comments:5 (2 by maintainers)
Top GitHub Comments
I think this is a reasonable solution. I’ll close for now and let you know if I have any issues with that.
yes, every new features must be well motivated and be useful to numerous users. And if there are good alternatives (which is the case here I think), I would tend to prefer those as every feature add a bit of complexity and makes the project slightly harder to maintain 😉