Dealing with inconstent conventions for env observations: (h,w,c) vs (c,h,w)
See original GitHub issueAtari environments in gym return observations in (h,w,c) format. In stable baselines 3, policies are implemented in pytorch, which assumes (c,h,w) observations. As a result, stable baselines 3 has a few places where observations are transposed: in a wrapper applied to an environment when it’s fed into a learning algorithm (see lines 163 and 224 of stable_baselines3/common/base_class.py), in some methods of policies (see lines 335 and 243 of stable_baselines3/common/policies.py), and possibly in other places. This means that when writing code, it is not always obvious whether this transposition has been applied to an environment.
For instance, the __init__ method of AgentTrainer in preference_comparisons.py calls self.algorithm.get_env() to set the environment. Because algorithm must be a BaseAlgorithm, it wraps its input venv in a wrapper that transposes observations as necessary before returning that venv in the get_env method.
This means that in the notebook imitation/examples/5_train_preference_comparisons.ipynb, if the environment is changed to be an Atari environment, reward_net is originally trained on inputs with format (c,h,w), but when placed in a RewardVecEnvWrapper, is called on inputs with format (h,w,c). This will likely cause poor performance, or in the case of a CNN reward net, a type error.
To deal with problems like this, the best solution is likely a wrapper around reward functions to detect when this transposition needs to be made and make it before it’s fed into the naive reward function.
Issue Analytics
- State:
- Created a year ago
- Comments:11 (7 by maintainers)

Top Related StackOverflow Question
Thanks for flagging this! Using
VecFrameStackseems like the natural thing to do withVecEnv, so I feel OK about only that use case working out of the box, although it is a bit unfortunate.I’m fine adding an option for transposition but would prefer to avoid auto-transposition logic.
Unfortunately, we can’t necessarily assume that all environments we come across will be (h,w,c). In Atari, recommended preprocessing involves frame-stacking. If one does this using SB3’s
VecFrameStack, then the frame stack dimension (which in this case is basically the channel dimension) comes last, but if one instead does it using gym’sFrameStack, the frame stack dimension comes first. In this case, it’s possible that we should just mandate use of the SB3 method, but wanted to check if that changed your mind re: autotransposition.