Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Bug] loading a trained MultiInput policy returns a random agent

See original GitHub issue

🐛 Bug

I am using a MultiInput policy, trained with PPO. I am experiencing issues when loading a trained agent. Specifically, the loaded model predicts actions differents from the original agent, even in deterministic mode.

To Reproduce

A minimal example is reported below:

It trains PPO on CarRacing-v0 for a few steps, then saves and reloads the model.
The trained model and the loaded model are used to predict actions for the same observations in deterministic mode.
The actions are different as reported in the plot.

The code:

import gym
import numpy as np
from stable_baselines3 import PPO
from stable_baselines3.common.env_checker import check_env

from racing_rl.envs.wrappers import FrameSkip
from skimage.color import rgb2gray


class VelocityWrapper(gym.ObservationWrapper):
    """ extend the observation space of racecar with velocity """

    def __init__(self, env):
        super(VelocityWrapper, self).__init__(env)
        self._w, self._h, c = self.observation_space.shape
        self.observation_space = gym.spaces.Dict({
            'image': gym.spaces.Box(low=0, high=255, shape=(80, 80, 1), dtype=np.uint8),
            'velocity': gym.spaces.Box(low=np.NINF, high=np.PINF, shape=(2,))
        })

    def observation(self, observation):
        image = np.reshape((rgb2gray(observation) * 255).astype(np.uint8), (self._w, self._h, 1))[:80, :80, :]
        image = np.where(image <= 150, 0, 255)
        dict_observation = {
            'image': image,
            'velocity': np.array(self.env.car.hull.linearVelocity)
        }
        return dict_observation


def twin_evaluations(model1, model2, env):
    """
    deterministic evaluation of 2 models when having the same observation as input.
    the action predicted by model1 is then used to interact with the environment.

    return a dictionary with the action for each model
    """
    done = False
    obs = env.reset()
    actions1, actions2 = [], []
    while not done:
        action1, _ = model1.predict(obs, deterministic=True)
        action2, _ = model2.predict(obs, deterministic=True)
        obs, reward, done, info = env.step(action1)
        actions1.append(action1)
        actions2.append(action2)
    return {'model1': np.array(actions1), 'model2': np.array(actions2)}


env = gym.make("CarRacing-v0")
env = VelocityWrapper(env)
env = FrameSkip(env, frame_skip=8)
check_env(env)
print("[info] env ok")

model = PPO("MultiInputPolicy", env, verbose=1)

model.learn(1000)
model.save("saved_model")

model2 = PPO("MultiInputPolicy", env, verbose=1, tensorboard_log=None)
model2.load("saved_model", print_system_info=True)

actions = twin_evaluations(model, model2, env)

import matplotlib.pyplot as plt

for i, name in enumerate(["steer", "gas", "brake"]):
    plt.subplot(3, 1, i + 1)
    plt.title(name)
    plt.plot(actions["model1"][:, i], label="after training")
    plt.plot(actions["model2"][:, i], label="after load")
    plt.legend()
plt.tight_layout()
plt.show()

The output:

/home/luigi/Development/racing-rl/venv/lib/python3.8/site-packages/stable_baselines3/common/env_checker.py:272: UserWarning: We recommend you to use a symmetric and normalized Box action space (range=[-1, 1]) cf https://stable-baselines3.readthedocs.io/en/master/guide/rl_tips.html
  warnings.warn(
Track generation: 1140..1429 -> 289-tiles track
[info] env ok
Using cuda device
Wrapping the env with a `Monitor` wrapper
Wrapping the env in a DummyVecEnv.
Wrapping the env in a VecTransposeImage.
Track generation: 1191..1493 -> 302-tiles track
Track generation: 1217..1525 -> 308-tiles track
Track generation: 1199..1503 -> 304-tiles track
Track generation: 926..1166 -> 240-tiles track
Track generation: 941..1186 -> 245-tiles track
Track generation: 1310..1641 -> 331-tiles track
retry to generate track (normal if there are not manyinstances of this message)
Track generation: 1324..1659 -> 335-tiles track
Track generation: 1320..1654 -> 334-tiles track
Track generation: 1107..1388 -> 281-tiles track
Track generation: 1062..1331 -> 269-tiles track
Track generation: 1033..1298 -> 265-tiles track
retry to generate track (normal if there are not manyinstances of this message)
Track generation: 1219..1533 -> 314-tiles track
Track generation: 1187..1487 -> 300-tiles track
Track generation: 1177..1475 -> 298-tiles track
Track generation: 1407..1763 -> 356-tiles track
Track generation: 1159..1453 -> 294-tiles track
Track generation: 1112..1398 -> 286-tiles track
retry to generate track (normal if there are not manyinstances of this message)
Track generation: 1296..1624 -> 328-tiles track
Track generation: 1168..1464 -> 296-tiles track
Track generation: 1180..1479 -> 299-tiles track
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 125      |
|    ep_rew_mean     | -58.2    |
| time/              |          |
|    fps             | 57       |
|    iterations      | 1        |
|    time_elapsed    | 35       |
|    total_timesteps | 2048     |
---------------------------------
Using cuda device
Wrapping the env with a `Monitor` wrapper
Wrapping the env in a DummyVecEnv.
Wrapping the env in a VecTransposeImage.
== CURRENT SYSTEM INFO ==
OS: Linux-5.11.0-38-generic-x86_64-with-glibc2.29 #42~20.04.1-Ubuntu SMP Tue Sep 28 20:41:07 UTC 2021
Python: 3.8.10
Stable-Baselines3: 1.3.0
PyTorch: 1.10.0+cu102
GPU Enabled: True
Numpy: 1.20.0
Gym: 0.19.0

== SAVED MODEL SYSTEM INFO ==
OS: Linux-5.11.0-38-generic-x86_64-with-glibc2.29 #42~20.04.1-Ubuntu SMP Tue Sep 28 20:41:07 UTC 2021
Python: 3.8.10
Stable-Baselines3: 1.3.0
PyTorch: 1.10.0+cu102
GPU Enabled: True
Numpy: 1.20.0
Gym: 0.19.0

Track generation: 1184..1484 -> 300-tiles track

Process finished with exit code 0

Plot of the actions: Figure_1

Expected behavior

Since the two models receive the same input and deterministic=True, I would expect the resulting actions to be the same.

### System Info I am using a virtual environment under Ubuntu 20.04 with the following info:

OS: Linux-5.11.0-38-generic-x86_64-with-glibc2.29 #42~20.04.1-Ubuntu SMP Tue Sep 28 20:41:07 UTC 2021
Stable-Baselines3: 1.3.0
Python: 3.8.10
PyTorch: 1.10.0+cu102
Gym: 0.19.0
GPU: NVIDIA GeForce RTX 2070
GPU Enabled: True
Numpy: 1.20.0

Additional context

None.

Checklist

I have checked that there is no similar issue in the repo (required)
I have read the documentation (required)
I have provided a minimal working example to reproduce the bug (required)

Issue Analytics

State:
Created 2 years ago
Comments:9 (9 by maintainers)

Top GitHub Comments

2reactions

araffincommented, Dec 16, 2021

duplicate of https://github.com/DLR-RM/stable-baselines3/issues/533

EDIT: for later users, as shown in the doc and in the issues, the load method is not in place, you need to do model = SAC.load() (or model = PPO.load()).

0reactions

Demetrio92commented, Dec 17, 2021

@araffin I submitted a PR.

I took some time to look through the implementation and understand why it does not or cannot load by-reference. It actually could, and I also figured out this is already implemented and is called .set_parameters

I also thought about a warning. However, we cannot check whether the result of the call is being assigned somewhere. But what we could do is to check whether cls is actually an instance, e.g. if load is called on an already instantiated model. And then warn, that this will overwrite it, and that the user probably should be calling .set_parameters instead.

E.g. something like

import inspect, warnings
...
if not inspect.isclass(cls):
  warnings.warn('You are using `.load` on an already instantiated model. This will re-create the model from scratch using loaded parameters. For loading parameters in-place consider using `.set_parameters`')

Let me know if you think this is appropriate.

Top Results From Across the Web

Examples — Stable Baselines 2.10.3a0 documentation

In the following example, we will train, save and load a DQN model on the Lunar ... trained model to demonstrate loading #...

Potential Bug Fixes in R2023a Prerelease - MATLAB & Simulink

Product Record Bug Summary Deep Learning Toolbox 2803451 Image generation experiment template errors MATLAB 2736745 Security Issue: who ‑file might execute code MATLAB 2739217 tab completion...

Use Batch Transform - Amazon SageMaker

If an error occurs, the uploaded results are removed from Amazon S3. ... When your dataset has multiple input files, a transform job...

Getting Started With Reinforcement Learning - Paperspace Blog

The agent is the most significant component of the reinforcement learning model ... trying to teach a complex problem like training a reinforcement...

MARVEL: Enabling controller load balancing in software ...

In the training phase, each agent learns how to migrate switches through ... The Switch Migration Problem (SMP) is usually formulated as an ......