Reset PPO `policy.log_std` when loading previously saved model
See original GitHub issueWhen performing curriculum learning, being able to reset the ppo policy.log_std
between training cycles would be nice. The following code will produce an error:
# Define RL model: randomized init
policy_kwargs = dict(log_std_init=0)
model = PPO(MlpPolicy, env=env1, policy_kwargs=policy_kwargs)
# Learn and save
model.learn(total_timesteps=500000, tb_log_name='ppo')
model.save("ppo_model")
# Define RL model: preload network parameters
policy_kwargs = dict(log_std_init=-0.5)
model = PPO.load(load_path="log_dir\ppo_1\ppo_model", env=env2, policy_kwargs=policy_kwargs)
# Learn and save again
model.learn(total_timesteps=500000, tb_log_name='ppo')
model.save("ppo_model")
Describe the bug
ValueError: The specified policy kwargs do not equal the stored policy kwargs.Stored kwargs
This is error is thrown because log_std_init
differs between the two training cycles.
Issue Analytics
- State:
- Created 3 years ago
- Comments:7 (3 by maintainers)
Top Results From Across the Web
PPO — Stable Baselines3 1.7.0a8 documentation
This is a parameter specific to the OpenAI implementation. If None is passed (default), no clipping will be done on the value function....
Read more >Stable baselines saving PPO model and retraining it again
What you can do, first of all is to write logs of the progress: model = PPO2(MlpPolicy, envs, tensorboard_log="./logs/progress_tensorboard/").
Read more >[RLLib] Loading keras weights in Model and using the Tune API
So the problem is how to load the weights, a h5 file saved from TensorFlow Checkpoint callback. Here we have both the weights...
Read more >Policy Optimization (PPO) - PyLessons
This tutorial will dive into understanding the PPO architecture and implement a Proximal Policy Optimization (PPO) agent that learns to play ...
Read more >ElegantRL: Mastering PPO Algorithms | by Xiao-Yang Liu
Tutorial for Proximal Policy Optimization Algorithms (PPO) ... save_load_model(): saves the model for training or loads the model for inference.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Hello,
We do that because you really need to know what is happening when you change those arguments between saving and loading. This prevent most users from unexpected behavior. For user that wants to change those anyway, they can do so after loading as @kyrwilliams mentioned, but it requires a good understanding of each RL algorithm. You can find an example with SAC (when using gSDE) here: https://github.com/DLR-RM/rl-baselines3-zoo/blob/e12a3019b57e11c876b6f875c5ff8c79a168c187/train.py#L569 Also changing
log_std
in the policy kwargs won’t work as the value will be overwritten when loading the saved state dict.You may need to register that parameter too and also check if it is present in the optimizer (which I assume is not the case given the result).
Thanks @Miffyli! So, a couple things:
(1) I attempted the pytorch save/load methods, manually reseting the
log_std
values with:But unfortunately this just locked
model.policy.log_std
at -0.5 throughout the entire training.(2) I found the following crude method DID work, since it uses the PPO class’s
.load
method which apparently updates the model in a specific way:This approach successfully reset the
log_std
to -0.5 and allowed the optimizer to adjust it during training.