Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Reset PPO `policy.log_std` when loading previously saved model

See original GitHub issue

When performing curriculum learning, being able to reset the ppo policy.log_std between training cycles would be nice. The following code will produce an error:

# Define RL model: randomized init
policy_kwargs = dict(log_std_init=0)
model = PPO(MlpPolicy, env=env1, policy_kwargs=policy_kwargs) 

# Learn and save
model.learn(total_timesteps=500000, tb_log_name='ppo')
model.save("ppo_model")

# Define RL model: preload network parameters
policy_kwargs = dict(log_std_init=-0.5)    
model = PPO.load(load_path="log_dir\ppo_1\ppo_model", env=env2, policy_kwargs=policy_kwargs)

# Learn and save again
model.learn(total_timesteps=500000, tb_log_name='ppo')
model.save("ppo_model")

Describe the bug ValueError: The specified policy kwargs do not equal the stored policy kwargs.Stored kwargs

This is error is thrown because log_std_init differs between the two training cycles.

Issue Analytics

State:
Created 3 years ago
Comments:7 (3 by maintainers)

Top GitHub Comments

2reactions

araffincommented, Sep 10, 2020

Hello,

Probably just me being dumdum again, but why exactly does the code enforce that provided policy_kwargs should match the saved when provided?

We do that because you really need to know what is happening when you change those arguments between saving and loading. This prevent most users from unexpected behavior. For user that wants to change those anyway, they can do so after loading as @kyrwilliams mentioned, but it requires a good understanding of each RL algorithm. You can find an example with SAC (when using gSDE) here: https://github.com/DLR-RM/rl-baselines3-zoo/blob/e12a3019b57e11c876b6f875c5ff8c79a168c187/train.py#L569 Also changing log_std in the policy kwargs won’t work as the value will be overwritten when loading the saved state dict.

(1) I attempted the pytorch save/load methods, manually reseting the log_std values with:

You may need to register that parameter too and also check if it is present in the optimizer (which I assume is not the case given the result).

2reactions

kyrwilliamscommented, Sep 8, 2020

Thanks @Miffyli! So, a couple things:

(1) I attempted the pytorch save/load methods, manually reseting the log_std values with:

model.policy.log_std = th.nn.Parameter(th.tensor([-0.5, -0.5, -0.5], device='cuda:0', requires_grad=True))
model.learn(total_timesteps=500000, tb_log_name='ppo')

But unfortunately this just locked model.policy.log_std at -0.5 throughout the entire training.

(2) I found the following crude method DID work, since it uses the PPO class’s .load method which apparently updates the model in a specific way:

model = PPO.load(load_path="log_dir\ppo_1\ppo_model", env=env2) # load saved model
model.policy.log_std=th.nn.Parameter(th.tensor([-0.5, -0.5, -0.5], device='cuda:0', requires_grad=True)) # reset log_std
model.save("ppo_model_temp") # save this adjusted model
model = PPO.load(load_path="ppo_model_temp", env=env2) # load the adjusted model
model.learn(total_timesteps=500000, tb_log_name='ppo') # learn

This approach successfully reset the log_std to -0.5 and allowed the optimizer to adjust it during training.

Top Results From Across the Web

PPO — Stable Baselines3 1.7.0a8 documentation

This is a parameter specific to the OpenAI implementation. If None is passed (default), no clipping will be done on the value function....

Stable baselines saving PPO model and retraining it again

What you can do, first of all is to write logs of the progress: model = PPO2(MlpPolicy, envs, tensorboard_log="./logs/progress_tensorboard/").

[RLLib] Loading keras weights in Model and using the Tune API

So the problem is how to load the weights, a h5 file saved from TensorFlow Checkpoint callback. Here we have both the weights...

Policy Optimization (PPO) - PyLessons

This tutorial will dive into understanding the PPO architecture and implement a Proximal Policy Optimization (PPO) agent that learns to play ...

ElegantRL: Mastering PPO Algorithms | by Xiao-Yang Liu

Tutorial for Proximal Policy Optimization Algorithms (PPO) ... save_load_model(): saves the model for training or loads the model for inference.

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Start Free

Top Related Reddit Thread

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

Reset PPO `policy.log_std` when loading previously saved model

Issue Analytics

Top GitHub Comments

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

Python 3.8: "an integer is required (got type bytes)" when loading models saved under older Python versions

[feature request] Channels-first environments without extra transpose wrapper