[Bug] VecNormalize fails on SAC/TD3
See original GitHub issue🐛 Bug
I find VecNormalize wrapper is unusable when training off-policy algos like SAC and TD3.
I think the issue is located in def _store_transition()
in off_policy_algorithm.py
:
...
if self._vec_normalize_env is not None:
new_obs_ = self._vec_normalize_env.get_original_obs()
reward_ = self._vec_normalize_env.get_original_reward()
else:
# Avoid changing the original ones
self._last_original_obs, new_obs_, reward_ = self._last_obs, new_obs, reward
# Avoid modification by reference
next_obs = deepcopy(new_obs_)
...
e.g. get_original_obs()
returns the unnormalized obs whose shape is not transposed: (1,96,96,3)
I might not understand exactly what’s the purpose of storing unnormalized obs and reward into the replay_buffer, when VecNormalize wrapper is used on purpose.
To Reproduce
env = make_vec_env('CarRacing-v0', 1)
env = VecNormalize(env, norm_obs=False) # same for norm_obs=True, eventhough image input should be scaled in defualt.
model = SAC('CnnPolicy', env, verbose=1) # or TD3
model.learn(total_timesteps=int(2e5))
And I get error below, same error can also be reproduced with the newest version of SB3 on master branch. The error is gone when using PPO or A2C.
Traceback (most recent call last):
File "/workspace/repos_dev/stable-baselines3/stable_baselines3/td3/td3.py", line 214, in learn
reset_num_timesteps=reset_num_timesteps,
File "/workspace/repos_dev/stable-baselines3/stable_baselines3/common/off_policy_algorithm.py", line 366, in learn
log_interval=log_interval,
File "/workspace/repos_dev/stable-baselines3/stable_baselines3/common/off_policy_algorithm.py", line 616, in collect_rollouts
self._store_transition(replay_buffer, buffer_actions, new_obs, rewards, dones, infos)
File "/workspace/repos_dev/stable-baselines3/stable_baselines3/common/off_policy_algorithm.py", line 534, in _store_transition
infos,
File "/workspace/repos_dev/stable-baselines3/stable_baselines3/common/buffers.py", line 246, in add
self.observations[self.pos] = np.array(obs).copy()
ValueError: could not broadcast input array from shape (1,96,96,3) into shape (1,3,96,96)
Expected behavior
I expect the VecNormalize wrapper should work on all algotrithms in environment ‘CarRacing’, in document I don’t see any constrain regarding the usage of VecNormalize wrapper.
### System Info I’m using SB3 1.4.0 , gym 0.21.0 and python 3.7.11.
Checklist
I find a related issue but unfortunately it doesn’t solve my issue.
- I have checked that there is no similar issue in the repo (required)
- I have read the documentation (required)
- I have provided a minimal working example to reproduce the bug (required)
Issue Analytics
- State:
- Created 2 years ago
- Comments:5 (4 by maintainers)
Top GitHub Comments
Hello,
If you use a
VecTransposeImage
wrapper before theVecNormalize
env, this solves your issue:yes, the issue is similar to #693 .
Hmm normalization of image pixels individually might hinder the performance (or not, not tested 😄), but it definitely is not something people do. For images, you should not use VecNormalize wrapper. Images (of type uint8) are automatically normalized to [0, 1] by dividing with 255.
Which part of the docs misled you to use VecNormalize with an image environment? The doc could be updated with a note/warning that one should only use VecNormalize with non-image envs 😃
Answering to sate your curiosity. Replay buffer stores the original samples so that when VecNormalize statistics change (which they do, constantly), you can re-normalize the replay buffer samples and use them with the new normalization parameters.