question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

PPO2 - network diverges, outputs become NaNs

See original GitHub issue

Installed via pip, running python3.6, TensorFlow 1.9

Environment is HalfCheetah-v2 (yet observed also in other environments).

env = VecNormalize(env, norm_obs=True, norm_reward=False, clip_obs=10.) model = PPO2(MlpPolicy, env, gamma=0.99, n_steps=2048, ent_coef=0.0, learning_rate=3e-4, max_grad_norm=0.5, lam=0.95, nminibatches=32, noptepochs=10, cliprange=0.2, vf_coef=1.0, verbose=2) model.learn(total_timesteps=int(1e6))

Also tried using the default parameters and same result.

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:5

github_iconTop GitHub Comments

1reaction
tesslerccommented, Oct 2, 2018

This seems to fix the issue. Thanks

0reactions
tesslerccommented, Oct 1, 2018

Hey, sorry for the delayed response. I’ll be back in the office tomorrow and will test the fix. Thanks!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Common causes of nans during training of neural networks
Reason: you have an input with nan in it! What you should expect: once the learning process "hits" this faulty input - output...
Read more >
PPO — Stable Baselines3 1.7.0a8 documentation
PPO¶. The Proximal Policy Optimization algorithm combines ideas from A2C (having multiple workers) and TRPO (it uses a trust region to improve the...
Read more >
An introduction to Policy Gradient methods - YouTube
In this episode I introduce Policy Gradient methods for Deep Reinforcement Learning.After a general overview, I dive into Proximal Policy ...
Read more >
PPO: policy loss becomes nan [closed]
I'm implement PPO for a very specific problem, and it seems to be working ... Function 'MulBackward0' returned nan values in its 0th...
Read more >
Stable Baselines Documentation - Read the Docs
The issue arises then NaNs or infs do not crash, but simply get propagated through the training, until all the floating.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found