Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[question] Loading the PPO model after training does seem to load the policy

See original GitHub issue

I’ve read similar questions (e.g. #30) that were asked here about loading the model after the training but still, I could not figure out what the problem with my model is. My model doesn’t seem to be using the trained policy/value networks when I run the following. I am not sure what the problem is with my setup and this does not look like a bug but I was wondering if anyone can tell me if what I am doing is incorrect? (check_env(env) does not give me any warning and the custom env is running fine it just makes random decisions although the in-progress training results looks that the agent has learned the task during training)

log_dir = "./PPO/"
os.makedirs(log_dir, exist_ok=True)

env=Monitor(CustomEnv(8090), log_dir)
# I do have have vecNormalization for the training, and the training is done on a vectorized environment
# but the evaluation is done on a single env 
# env = Normalize(env, norm_obs=True, norm_reward=True,
#                    clip_obs=1.)

check_env(env)
model = PPO.load(log_dir + "/rl_model_8080_12000000_steps", env=env, verbose=True, tensorboard_log=log_dir)
model.set_env(env)#although the above should be able to use the env

mean_reward, std_reward = evaluate_policy(model, env,  n_eval_episodes=10)
print(f"mean_reward:{mean_reward:.2f} +/- {std_reward:.2f}")

System Info Describe the characteristic of your environment:

conda virtual env,
I have 2080 rtx but don’t think it is being used
python 3.6.13, stablebaselines 1.1.0, tensorflow 1.14.0, pytorch 1.4.0

Issue Analytics

State:
Created 2 years ago
Comments:5

Top GitHub Comments

1reaction

araffincommented, Sep 21, 2021

Hello answer is in the doc and i highly recommend you to use SB3: https://stable-baselines3.readthedocs.io/en/master/guide/examples.html#pybullet-normalizing-input-features

0reactions

Milad-Rakhshacommented, Sep 23, 2021

Yes, thanks for helping.

Top Results From Across the Web

Training the same model after loading. · Issue #30 - GitHub

Hello again,. I was looking into continuing training after loading a model. Simply using model.load("path-to-model") model.learn(total_timesteps= 500000)

stable-baselines3 PPO model loaded but not working

I create the PPO model and make it learn for a couple thousand timesteps. Now when I evaluate the policy, the car renders...

Reinforcement Learning in Python with Stable Baselines 3

Welcome to part 2 of the reinforcement learning with Stable Baselines 3 tutorials. We left off with training a few models in the...

tensorforce/community - Gitter

I save and load model by saved-model but I faced another problem when loading saved-model format, I can't load the saved model.

Proximal Policy Optimization — Spinning Up documentation

PPO is an on-policy algorithm. PPO can be used for environments with either discrete or continuous action spaces. The Spinning Up implementation of...