NaN values in acktr
See original GitHub issueHi everyone, I’m trying to use continuous acktr to learn to reach a target with a mujoco simulation of the jaco arm. I use exactly the same hyperparameters as for the reacher env and acktr definitely learns something meaningful, the reward goes up and I can also see it when I render the frames.
The problem is, that after some 2000-3000 iterations, the algorithm starts to produce nan values.
The log at the time when it starts to happen looks as follows:
Iteration 3025
kl just right!
| EVAfter | 0.984 |
| EVBefore | 0.976 |
| EpLenMean | 200 |
| EpRewMean | -8.5 |
| EpRewSEM | 0.82 |
| KL | 0.00148061 |
Iteration 3026
kl too low
| EVAfter | 0.984 |
| EVBefore | 0.98 |
| EpLenMean | 200 |
| EpRewMean | -7.31 |
| EpRewSEM | 0.613 |
| KL | 0.000913428 |
Iteration 3027
kl just right!
| EVAfter | 0.98 |
| EVBefore | 0.976 |
| EpLenMean | 200 |
| EpRewMean | -8.92 |
| EpRewSEM | 0.937 |
| KL | nan |
Then of course the nans start to spread and everything becomes nan. Does anyone have an idea what could cause such behaviour and what to do against it?
Issue Analytics
- State:
- Created 6 years ago
- Comments:12 (3 by maintainers)
Top Results From Across the Web
Dealing with NaNs and infs - Stable Baselines - Read the Docs
How and why? Numpy parameters; Tensorflow parameters; VecCheckNan Wrapper; RL Model hyperparameters; Missing values from datasets. On saving and loading ...
Read more >Machine Learning dealing with NaN values - Cross Validated
Handling NaN values belongs to the feature engineering part of developing machine learning models. Different types of models make different ...
Read more >Additional Readings - Deep Learning Wizard
Missing Values ¶. BRITS. If you face problems in missing data in your time series and you use existing imputation methods, there is...
Read more >Stable Baselines Documentation - Read the Docs
You should give a try to PPO2, A2C and its successors (ACKTR, ACER). ... As some datasets will sometimes fill missing values with...
Read more >Understanding the Effects of Second-Order Approximations in ...
parameters, directly computing the inverse of the Fisher- information matrix is intractable ... [3], ACKTR [19] used the natural gradient to further improve....
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Ok I found a small detail in adjusting stepsize that wasn’t in baselines code that fixes the NaN issue in @lukashermann Jaco environment and roboschool humanoid @Breakend
Change lines 121-129 in https://github.com/openai/baselines/blob/master/baselines/acktr/acktr_cont.py to
I will create pull request with this fix and other misc small tweaks soon. Thanks for your patience !
I’m still getting Nans for a custom environment. This was remedied by scaling the reward down as suggested.