Some questions regarding VecNormalize
See original GitHub issueAccording to the docs, when creating custom environments, we should always normalize the observation space. For this, you have the VecNormalize wrapper, which creates a moving average and then normalizes the obs.
Let´s say I have 2 observations: height (m) and weight (kg) of a person. My observation space would be something like a Box with low = [0, 0]
and high = [2.5, 300]
. But since I’m using a VecNormalize, this isn’t correct anymore, right?
So should I instead change it to low = [-10, -10]
and high = [10, 10]
? (10 being the default clipping value for VecNormalize)
Another question: when should we normalize the rewards as well? (in the mujoco example shown in the docs you chose to only normalize the observations - why?)
Finally, what’s the purpose of the discount factor? Should it be the same as the discount factor of whatever algorithm we’re using?
Issue Analytics
- State:
- Created 4 years ago
- Reactions:2
- Comments:8
Top GitHub Comments
the boundaries in the observation space does not really matter (for everything that is not images), we usually set them to [-inf, inf].
Good question, the answer is there: https://github.com/openai/baselines/issues/538 and https://github.com/openai/baselines/issues/629 additional resource: https://github.com/hill-a/stable-baselines/issues/234
yes
We should change that (we would appreciate a PR for that), it is an old example, no real reason for not normalizing the reward too.
Layer normalization is quite different, see associated paper: https://arxiv.org/abs/1607.06450 It is there mostly because of the parameter noise exploration for DDPG (cf doc).
@cevans3098 I can only recommend you to take a look at the rl zoo, you forgot to save and load the
VecNormalize
in your case.Closing this issue as the original question was answered.