question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

NaN values in acktr

See original GitHub issue

Hi everyone, I’m trying to use continuous acktr to learn to reach a target with a mujoco simulation of the jaco arm. I use exactly the same hyperparameters as for the reacher env and acktr definitely learns something meaningful, the reward goes up and I can also see it when I render the frames.

The problem is, that after some 2000-3000 iterations, the algorithm starts to produce nan values.

The log at the time when it starts to happen looks as follows:


Iteration 3025
kl just right!

| EVAfter   | 0.984      |
| EVBefore  | 0.976      |
| EpLenMean | 200        |
| EpRewMean | -8.5       |
| EpRewSEM  | 0.82       |
| KL        | 0.00148061 |

Iteration 3026 
kl too low

| EVAfter   | 0.984       |
| EVBefore  | 0.98        |
| EpLenMean | 200         |
| EpRewMean | -7.31       |
| EpRewSEM  | 0.613       |
| KL        | 0.000913428 |

Iteration 3027
kl just right!

| EVAfter   | 0.98     |
| EVBefore  | 0.976    |
| EpLenMean | 200      |
| EpRewMean | -8.92    |
| EpRewSEM  | 0.937    |
| KL        | nan      |

Then of course the nans start to spread and everything becomes nan. Does anyone have an idea what could cause such behaviour and what to do against it?

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:12 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
mansimovcommented, Sep 27, 2017

Ok I found a small detail in adjusting stepsize that wasn’t in baselines code that fixes the NaN issue in @lukashermann Jaco environment and roboschool humanoid @Breakend

Change lines 121-129 in https://github.com/openai/baselines/blob/master/baselines/acktr/acktr_cont.py to

        min_stepsize = np.float32(1e-8)
        max_stepsize = np.float32(1e0)
        # Adjust stepsize
        kl = policy.compute_kl(ob_no, oldac_dist)
        if kl > desired_kl * 2:
            logger.log("kl too high")
            U.eval(tf.assign(stepsize, tf.maximum(min_stepsize, stepsize / 1.5)))
        elif kl < desired_kl / 2:
            logger.log("kl too low")
            U.eval(tf.assign(stepsize, tf.minimum(max_stepsize, stepsize * 1.5)))
        else:
            logger.log("kl just right!")

I will create pull request with this fix and other misc small tweaks soon. Thanks for your patience !

0reactions
jirenucommented, Dec 11, 2017

I’m still getting Nans for a custom environment. This was remedied by scaling the reward down as suggested.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Dealing with NaNs and infs - Stable Baselines - Read the Docs
How and why? Numpy parameters; Tensorflow parameters; VecCheckNan Wrapper; RL Model hyperparameters; Missing values from datasets. On saving and loading ...
Read more >
Machine Learning dealing with NaN values - Cross Validated
Handling NaN values belongs to the feature engineering part of developing machine learning models. Different types of models make different ...
Read more >
Additional Readings - Deep Learning Wizard
Missing Values ¶. BRITS. If you face problems in missing data in your time series and you use existing imputation methods, there is...
Read more >
Stable Baselines Documentation - Read the Docs
You should give a try to PPO2, A2C and its successors (ACKTR, ACER). ... As some datasets will sometimes fill missing values with...
Read more >
Understanding the Effects of Second-Order Approximations in ...
parameters, directly computing the inverse of the Fisher- information matrix is intractable ... [3], ACKTR [19] used the natural gradient to further improve....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found