[question] Definition of observation space without knowing low and high values
See original GitHub issueI would like to ask what is the proper way to define the observation space if we do not know the values that the different observations could take during training.
I tried to define them by using some large values, in which they will for sure be in, but when I try to check the environment with check_env, I get the following warning:
warnings.warn("We recommend you to use a symmetric and normalized Box action space (range=[-1, 1])
Even if I define them to be in [-1, 1], then I will have to normalize them before pass the values to the agent, but again I need their maximum and minimum values.
Do I miss something?
Issue Analytics
- State:
- Created 3 years ago
- Comments:18
Top Results From Across the Web
openai-gym how to determine what the values in observation ...
I cant figure out what each number in observation space means. I guess two of them are x and y coordinates (although I...
Read more >States, Observation and Action Spaces in Reinforcement ...
The query returns the Observation space and Action space type. It also returns the High and Low values of the intervals, in the...
Read more >Cognitive, Affective, and Psychomotor Domains
Like the cognitive domain, the affective domain is hierarchical with higher levels being more complex and depending upon mastery of the lower levels....
Read more >Getting Started With OpenAI Gym: The Basic Building Blocks
In this tutorial, we'll cover how to get started with OpenAI gym. This includes installation, setting up environments, spaces, and wrappers.
Read more >1: Observation, Measurement, and Sampling
Ordinal variable : a variable used to store discrete measurements that can be ordered from low to high, but do not have equal...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Yes, if you are manually running the environments (e.g. manual evaluation), you might need to do that.
PS: I recommend migrating to stable-baselines3 if possible, as it is more actively supported.
Compared to the example code this seems to check out (note: I have not used normalize wrappers myself). I would double check that the observation statistics do not change wildly between creating different envs.
What do the training stats tell you (
verbose=2
in creation of agent)? With PPO, this gives you a good idea of the agent’s performance even before separate evaluation.Those >1 and <-1 values would be clipped if you set it to clip at 1, yes. But by default it clips to [-10, 10], so no clipping probably happens in your case.
Note that we do not have time to offer per-user tech support, so my answers are short. If you continue having bugs, I still recommend using SB3 if possible. If you spot a bug or confusing behaviour, feel free to open an issue about fixing it 😃