[question] Loading the PPO model after training does seem to load the policy
See original GitHub issueI’ve read similar questions (e.g. #30) that were asked here about loading the model after the training but still, I could not figure out what the problem with my model is. My model doesn’t seem to be using the trained policy/value networks when I run the following. I am not sure what the problem is with my setup and this does not look like a bug but I was wondering if anyone can tell me if what I am doing is incorrect? (check_env(env) does not give me any warning and the custom env is running fine it just makes random decisions although the in-progress training results looks that the agent has learned the task during training)
log_dir = "./PPO/"
os.makedirs(log_dir, exist_ok=True)
env=Monitor(CustomEnv(8090), log_dir)
# I do have have vecNormalization for the training, and the training is done on a vectorized environment
# but the evaluation is done on a single env
# env = Normalize(env, norm_obs=True, norm_reward=True,
# clip_obs=1.)
check_env(env)
model = PPO.load(log_dir + "/rl_model_8080_12000000_steps", env=env, verbose=True, tensorboard_log=log_dir)
model.set_env(env)#although the above should be able to use the env
mean_reward, std_reward = evaluate_policy(model, env, n_eval_episodes=10)
print(f"mean_reward:{mean_reward:.2f} +/- {std_reward:.2f}")
System Info Describe the characteristic of your environment:
- conda virtual env,
- I have 2080 rtx but don’t think it is being used
- python 3.6.13, stablebaselines 1.1.0, tensorflow 1.14.0, pytorch 1.4.0
Issue Analytics
- State:
- Created 2 years ago
- Comments:5
Top Results From Across the Web
Training the same model after loading. · Issue #30 - GitHub
Hello again,. I was looking into continuing training after loading a model. Simply using model.load("path-to-model") model.learn(total_timesteps= 500000)
Read more >stable-baselines3 PPO model loaded but not working
I create the PPO model and make it learn for a couple thousand timesteps. Now when I evaluate the policy, the car renders...
Read more >Reinforcement Learning in Python with Stable Baselines 3
Welcome to part 2 of the reinforcement learning with Stable Baselines 3 tutorials. We left off with training a few models in the...
Read more >tensorforce/community - Gitter
I save and load model by saved-model but I faced another problem when loading saved-model format, I can't load the saved model.
Read more >Proximal Policy Optimization — Spinning Up documentation
PPO is an on-policy algorithm. PPO can be used for environments with either discrete or continuous action spaces. The Spinning Up implementation of...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Hello answer is in the doc and i highly recommend you to use SB3: https://stable-baselines3.readthedocs.io/en/master/guide/examples.html#pybullet-normalizing-input-features
Yes, thanks for helping.