PPO2 - network diverges, outputs become NaNs
See original GitHub issueInstalled via pip, running python3.6, TensorFlow 1.9
Environment is HalfCheetah-v2 (yet observed also in other environments).
env = VecNormalize(env, norm_obs=True, norm_reward=False, clip_obs=10.)
model = PPO2(MlpPolicy, env, gamma=0.99, n_steps=2048, ent_coef=0.0, learning_rate=3e-4, max_grad_norm=0.5, lam=0.95, nminibatches=32, noptepochs=10, cliprange=0.2, vf_coef=1.0, verbose=2)
model.learn(total_timesteps=int(1e6))
Also tried using the default parameters and same result.
Issue Analytics
- State:
- Created 5 years ago
- Comments:5
Top Results From Across the Web
Common causes of nans during training of neural networks
Reason: you have an input with nan in it! What you should expect: once the learning process "hits" this faulty input - output...
Read more >PPO — Stable Baselines3 1.7.0a8 documentation
PPO¶. The Proximal Policy Optimization algorithm combines ideas from A2C (having multiple workers) and TRPO (it uses a trust region to improve the...
Read more >An introduction to Policy Gradient methods - YouTube
In this episode I introduce Policy Gradient methods for Deep Reinforcement Learning.After a general overview, I dive into Proximal Policy ...
Read more >PPO: policy loss becomes nan [closed]
I'm implement PPO for a very specific problem, and it seems to be working ... Function 'MulBackward0' returned nan values in its 0th...
Read more >Stable Baselines Documentation - Read the Docs
The issue arises then NaNs or infs do not crash, but simply get propagated through the training, until all the floating.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
This seems to fix the issue. Thanks
Hey, sorry for the delayed response. I’ll be back in the office tomorrow and will test the fix. Thanks!