[Bug] loading a trained MultiInput policy returns a random agent
See original GitHub issue🐛 Bug
I am using a MultiInput policy, trained with PPO. I am experiencing issues when loading a trained agent. Specifically, the loaded model predicts actions differents from the original agent, even in deterministic mode.
To Reproduce
A minimal example is reported below:
- It trains PPO on CarRacing-v0 for a few steps, then saves and reloads the model.
- The trained model and the loaded model are used to predict actions for the same observations in deterministic mode.
- The actions are different as reported in the plot.
The code:
import gym
import numpy as np
from stable_baselines3 import PPO
from stable_baselines3.common.env_checker import check_env
from racing_rl.envs.wrappers import FrameSkip
from skimage.color import rgb2gray
class VelocityWrapper(gym.ObservationWrapper):
""" extend the observation space of racecar with velocity """
def __init__(self, env):
super(VelocityWrapper, self).__init__(env)
self._w, self._h, c = self.observation_space.shape
self.observation_space = gym.spaces.Dict({
'image': gym.spaces.Box(low=0, high=255, shape=(80, 80, 1), dtype=np.uint8),
'velocity': gym.spaces.Box(low=np.NINF, high=np.PINF, shape=(2,))
})
def observation(self, observation):
image = np.reshape((rgb2gray(observation) * 255).astype(np.uint8), (self._w, self._h, 1))[:80, :80, :]
image = np.where(image <= 150, 0, 255)
dict_observation = {
'image': image,
'velocity': np.array(self.env.car.hull.linearVelocity)
}
return dict_observation
def twin_evaluations(model1, model2, env):
"""
deterministic evaluation of 2 models when having the same observation as input.
the action predicted by model1 is then used to interact with the environment.
return a dictionary with the action for each model
"""
done = False
obs = env.reset()
actions1, actions2 = [], []
while not done:
action1, _ = model1.predict(obs, deterministic=True)
action2, _ = model2.predict(obs, deterministic=True)
obs, reward, done, info = env.step(action1)
actions1.append(action1)
actions2.append(action2)
return {'model1': np.array(actions1), 'model2': np.array(actions2)}
env = gym.make("CarRacing-v0")
env = VelocityWrapper(env)
env = FrameSkip(env, frame_skip=8)
check_env(env)
print("[info] env ok")
model = PPO("MultiInputPolicy", env, verbose=1)
model.learn(1000)
model.save("saved_model")
model2 = PPO("MultiInputPolicy", env, verbose=1, tensorboard_log=None)
model2.load("saved_model", print_system_info=True)
actions = twin_evaluations(model, model2, env)
import matplotlib.pyplot as plt
for i, name in enumerate(["steer", "gas", "brake"]):
plt.subplot(3, 1, i + 1)
plt.title(name)
plt.plot(actions["model1"][:, i], label="after training")
plt.plot(actions["model2"][:, i], label="after load")
plt.legend()
plt.tight_layout()
plt.show()
The output:
/home/luigi/Development/racing-rl/venv/lib/python3.8/site-packages/stable_baselines3/common/env_checker.py:272: UserWarning: We recommend you to use a symmetric and normalized Box action space (range=[-1, 1]) cf https://stable-baselines3.readthedocs.io/en/master/guide/rl_tips.html
warnings.warn(
Track generation: 1140..1429 -> 289-tiles track
[info] env ok
Using cuda device
Wrapping the env with a `Monitor` wrapper
Wrapping the env in a DummyVecEnv.
Wrapping the env in a VecTransposeImage.
Track generation: 1191..1493 -> 302-tiles track
Track generation: 1217..1525 -> 308-tiles track
Track generation: 1199..1503 -> 304-tiles track
Track generation: 926..1166 -> 240-tiles track
Track generation: 941..1186 -> 245-tiles track
Track generation: 1310..1641 -> 331-tiles track
retry to generate track (normal if there are not manyinstances of this message)
Track generation: 1324..1659 -> 335-tiles track
Track generation: 1320..1654 -> 334-tiles track
Track generation: 1107..1388 -> 281-tiles track
Track generation: 1062..1331 -> 269-tiles track
Track generation: 1033..1298 -> 265-tiles track
retry to generate track (normal if there are not manyinstances of this message)
Track generation: 1219..1533 -> 314-tiles track
Track generation: 1187..1487 -> 300-tiles track
Track generation: 1177..1475 -> 298-tiles track
Track generation: 1407..1763 -> 356-tiles track
Track generation: 1159..1453 -> 294-tiles track
Track generation: 1112..1398 -> 286-tiles track
retry to generate track (normal if there are not manyinstances of this message)
Track generation: 1296..1624 -> 328-tiles track
Track generation: 1168..1464 -> 296-tiles track
Track generation: 1180..1479 -> 299-tiles track
---------------------------------
| rollout/ | |
| ep_len_mean | 125 |
| ep_rew_mean | -58.2 |
| time/ | |
| fps | 57 |
| iterations | 1 |
| time_elapsed | 35 |
| total_timesteps | 2048 |
---------------------------------
Using cuda device
Wrapping the env with a `Monitor` wrapper
Wrapping the env in a DummyVecEnv.
Wrapping the env in a VecTransposeImage.
== CURRENT SYSTEM INFO ==
OS: Linux-5.11.0-38-generic-x86_64-with-glibc2.29 #42~20.04.1-Ubuntu SMP Tue Sep 28 20:41:07 UTC 2021
Python: 3.8.10
Stable-Baselines3: 1.3.0
PyTorch: 1.10.0+cu102
GPU Enabled: True
Numpy: 1.20.0
Gym: 0.19.0
== SAVED MODEL SYSTEM INFO ==
OS: Linux-5.11.0-38-generic-x86_64-with-glibc2.29 #42~20.04.1-Ubuntu SMP Tue Sep 28 20:41:07 UTC 2021
Python: 3.8.10
Stable-Baselines3: 1.3.0
PyTorch: 1.10.0+cu102
GPU Enabled: True
Numpy: 1.20.0
Gym: 0.19.0
Track generation: 1184..1484 -> 300-tiles track
Process finished with exit code 0
Plot of the actions:
Expected behavior
Since the two models receive the same input and deterministic=True, I would expect the resulting actions to be the same.
### System Info I am using a virtual environment under Ubuntu 20.04 with the following info:
- OS:
Linux-5.11.0-38-generic-x86_64-with-glibc2.29 #42~20.04.1-Ubuntu SMP Tue Sep 28 20:41:07 UTC 2021
- Stable-Baselines3: 1.3.0
- Python: 3.8.10
- PyTorch: 1.10.0+cu102
- Gym: 0.19.0
- GPU: NVIDIA GeForce RTX 2070
- GPU Enabled: True
- Numpy: 1.20.0
Additional context
None.
Checklist
- I have checked that there is no similar issue in the repo (required)
- I have read the documentation (required)
- I have provided a minimal working example to reproduce the bug (required)
Issue Analytics
- State:
- Created 2 years ago
- Comments:9 (9 by maintainers)
Top Results From Across the Web
Examples — Stable Baselines 2.10.3a0 documentation
In the following example, we will train, save and load a DQN model on the Lunar ... trained model to demonstrate loading #...
Read more >Potential Bug Fixes in R2023a Prerelease - MATLAB & Simulink
Product Record Bug Summary
Deep Learning Toolbox 2803451 Image generation experiment template errors
MATLAB 2736745 Security Issue: who ‑file might execute code
MATLAB 2739217 tab completion...
Read more >Use Batch Transform - Amazon SageMaker
If an error occurs, the uploaded results are removed from Amazon S3. ... When your dataset has multiple input files, a transform job...
Read more >Getting Started With Reinforcement Learning - Paperspace Blog
The agent is the most significant component of the reinforcement learning model ... trying to teach a complex problem like training a reinforcement...
Read more >MARVEL: Enabling controller load balancing in software ...
In the training phase, each agent learns how to migrate switches through ... The Switch Migration Problem (SMP) is usually formulated as an ......
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
duplicate of https://github.com/DLR-RM/stable-baselines3/issues/533
EDIT: for later users, as shown in the doc and in the issues, the load method is not in place, you need to do
model = SAC.load()
(ormodel = PPO.load()
).@araffin I submitted a PR.
I took some time to look through the implementation and understand why it does not or cannot load by-reference. It actually could, and I also figured out this is already implemented and is called
.set_parameters
I also thought about a warning. However, we cannot check whether the result of the call is being assigned somewhere. But what we could do is to check whether
cls
is actually an instance, e.g. ifload
is called on an already instantiated model. And then warn, that this will overwrite it, and that the user probably should be calling.set_parameters
instead.E.g. something like
Let me know if you think this is appropriate.