"lr_schedule" option ignored using torch framework and PPO algorithm
See original GitHub issueRay version and other system information (Python version, TensorFlow version, OS):
- Ray: 0.9.0.dev0 (2c599dbf05e41e338920ee2fbe692658bcbec4dd)
- CUDA: 10.1
- Pytorch: 1.4.0 with GPU support
- Ubuntu 18.04
- Python 3.6
What is the problem?
Setting the hyperparameter “lr_schedule” as no effect when using PyTorch as backend framework and PPO learning algorithm.
Reproduction (REQUIRED)
import ray
from ray.rllib.agents.ppo import PPOTrainer, DEFAULT_CONFIG
config = DEFAULT_CONFIG.copy()
for key, val in {
"env": "CartPole-v0",
"num_workers": 0,
"use_pytorch": False,
"lr": 1.0e-5,
"lr_schedule": [
[0, 1.0e-6],
[1, 1.0e-7],
]
}.items(): config[key] = val
ray.init()
for use_pytorch in [False, True]:
config["use_pytorch"] = use_pytorch
agent = PPOTrainer(config, "CartPole-v0")
for _ in range(2):
result = agent.train()
print(f"use_pytorch: {use_pytorch} - Current learning rate: "\
f"{result['info']['learner']['default_policy']['cur_lr']}")
- I have verified my script runs in a clean environment and reproduces the issue.
- I have verified the issue also occurs with the latest wheels.
Issue Analytics
- State:
- Created 3 years ago
- Comments:6 (6 by maintainers)
Top Results From Across the Web
Algorithms — Ray 2.2.0 - the Ray documentation
Algorithm. Frameworks ... PPO. tf + torch. Yes +parametric ... APPO is not always more efficient; it is often better to use standard...
Read more >Proximal Policy Optimization — Spinning Up documentation
PPO is an on-policy algorithm. PPO can be used for environments with either discrete or continuous action spaces. The Spinning Up implementation of...
Read more >Proximal Policy Optimization (PPO) is Easy With PyTorch
Proximal Policy Optimization is an advanced actor critic algorithm designed to improve performance by constraining updates to our actor ...
Read more >Lessons from Implementing 12 Deep RL Algorithms in TF and ...
Internally, this tells RLlib to try to use the torch version of a policy for ... system throughput (ignoring learning) across a few...
Read more >How to print the adjusting learning rate in Pytorch?
While I use torch.optim.Adam and exponential decay_lr in my PPO algorithm: ... Then I print the lr in my epoch dynamiclly with:
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Nice, thank you ! The learning rate using tensorflow comes from a conversion from float32 to float64 that must be done somewhere. If you want to check:
Hmmm I made some more experiments and I am not convinced that the lr is actually properly updated… Is it possible that the learning rate in
cur_lr
is different from the actual learning rate?