question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Dealing with inconstent conventions for env observations: (h,w,c) vs (c,h,w)

See original GitHub issue

Atari environments in gym return observations in (h,w,c) format. In stable baselines 3, policies are implemented in pytorch, which assumes (c,h,w) observations. As a result, stable baselines 3 has a few places where observations are transposed: in a wrapper applied to an environment when it’s fed into a learning algorithm (see lines 163 and 224 of stable_baselines3/common/base_class.py), in some methods of policies (see lines 335 and 243 of stable_baselines3/common/policies.py), and possibly in other places. This means that when writing code, it is not always obvious whether this transposition has been applied to an environment.

For instance, the __init__ method of AgentTrainer in preference_comparisons.py calls self.algorithm.get_env() to set the environment. Because algorithm must be a BaseAlgorithm, it wraps its input venv in a wrapper that transposes observations as necessary before returning that venv in the get_env method.

This means that in the notebook imitation/examples/5_train_preference_comparisons.ipynb, if the environment is changed to be an Atari environment, reward_net is originally trained on inputs with format (c,h,w), but when placed in a RewardVecEnvWrapper, is called on inputs with format (h,w,c). This will likely cause poor performance, or in the case of a CNN reward net, a type error.

To deal with problems like this, the best solution is likely a wrapper around reward functions to detect when this transposition needs to be made and make it before it’s fed into the naive reward function.

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:11 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
AdamGleavecommented, Aug 10, 2022

Unfortunately, we can’t necessarily assume that all environments we come across will be (h,w,c). In Atari, recommended preprocessing involves frame-stacking. If one does this using SB3’s VecFrameStack, then the frame stack dimension (which in this case is basically the channel dimension) comes last, but if one instead does it using gym’s FrameStack, the frame stack dimension comes first. In this case, it’s possible that we should just mandate use of the SB3 method, but wanted to check if that changed your mind re: autotransposition.

Thanks for flagging this! Using VecFrameStack seems like the natural thing to do with VecEnv, so I feel OK about only that use case working out of the box, although it is a bit unfortunate.

I’m fine adding an option for transposition but would prefer to avoid auto-transposition logic.

0reactions
dfilancommented, Aug 10, 2022

Unfortunately, we can’t necessarily assume that all environments we come across will be (h,w,c). In Atari, recommended preprocessing involves frame-stacking. If one does this using SB3’s VecFrameStack, then the frame stack dimension (which in this case is basically the channel dimension) comes last, but if one instead does it using gym’s FrameStack, the frame stack dimension comes first. In this case, it’s possible that we should just mandate use of the SB3 method, but wanted to check if that changed your mind re: autotransposition.

Read more comments on GitHub >

github_iconTop Results From Across the Web

master PDF - Stable Baselines3 Documentation
1.9.4 Multiple Inputs and Dictionary Observations. Stable Baselines3 supports handling of multiple inputs by using Dict Gym space.
Read more >
NVIDIA Deep Learning TensorRT Documentation
This NVIDIA TensorRT Developer Guide demonstrates how to use the C++ and Python APIs for implementing the most common deep learning layers.
Read more >
environmental conditions geographic - WorldWideScience
Managing environmental radioactivity monitoring data: a geographic information ... and (3) identification of inconsistent observations within each cluster.
Read more >
identify key environmental: Topics by Science.gov
Framework for Identifying Key Environmental Concerns in Marine Renewable Energy Projects- Appendices · SciTech Connect. Kramer, Sharon; Previsic, Mirko; ...
Read more >
From Cellular to Holistic: Development of Algorithms to Study ...
and environmental basis of diseases, there have been relatively minor advances in ... observation of behavior in context and in natural settings.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found