Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

NaN values in acktr

See original GitHub issue

Hi everyone, I’m trying to use continuous acktr to learn to reach a target with a mujoco simulation of the jaco arm. I use exactly the same hyperparameters as for the reacher env and acktr definitely learns something meaningful, the reward goes up and I can also see it when I render the frames.

The problem is, that after some 2000-3000 iterations, the algorithm starts to produce nan values.

The log at the time when it starts to happen looks as follows:


Iteration 3025
kl just right!

| EVAfter   | 0.984      |
| EVBefore  | 0.976      |
| EpLenMean | 200        |
| EpRewMean | -8.5       |
| EpRewSEM  | 0.82       |
| KL        | 0.00148061 |

Iteration 3026 
kl too low

| EVAfter   | 0.984       |
| EVBefore  | 0.98        |
| EpLenMean | 200         |
| EpRewMean | -7.31       |
| EpRewSEM  | 0.613       |
| KL        | 0.000913428 |

Iteration 3027
kl just right!

| EVAfter   | 0.98     |
| EVBefore  | 0.976    |
| EpLenMean | 200      |
| EpRewMean | -8.92    |
| EpRewSEM  | 0.937    |
| KL        | nan      |

Then of course the nans start to spread and everything becomes nan. Does anyone have an idea what could cause such behaviour and what to do against it?

Issue Analytics

State:
Created 6 years ago
Comments:12 (3 by maintainers)

Top GitHub Comments

1reaction

mansimovcommented, Sep 27, 2017

Ok I found a small detail in adjusting stepsize that wasn’t in baselines code that fixes the NaN issue in @lukashermann Jaco environment and roboschool humanoid @Breakend

Change lines 121-129 in https://github.com/openai/baselines/blob/master/baselines/acktr/acktr_cont.py to

        min_stepsize = np.float32(1e-8)
        max_stepsize = np.float32(1e0)
        # Adjust stepsize
        kl = policy.compute_kl(ob_no, oldac_dist)
        if kl > desired_kl * 2:
            logger.log("kl too high")
            U.eval(tf.assign(stepsize, tf.maximum(min_stepsize, stepsize / 1.5)))
        elif kl < desired_kl / 2:
            logger.log("kl too low")
            U.eval(tf.assign(stepsize, tf.minimum(max_stepsize, stepsize * 1.5)))
        else:
            logger.log("kl just right!")

I will create pull request with this fix and other misc small tweaks soon. Thanks for your patience !

0reactions

jirenucommented, Dec 11, 2017

I’m still getting Nans for a custom environment. This was remedied by scaling the reward down as suggested.

Top Results From Across the Web

Dealing with NaNs and infs - Stable Baselines - Read the Docs

How and why? Numpy parameters; Tensorflow parameters; VecCheckNan Wrapper; RL Model hyperparameters; Missing values from datasets. On saving and loading ...

Machine Learning dealing with NaN values - Cross Validated

Handling NaN values belongs to the feature engineering part of developing machine learning models. Different types of models make different ...

Additional Readings - Deep Learning Wizard

Missing Values ¶. BRITS. If you face problems in missing data in your time series and you use existing imputation methods, there is...

Stable Baselines Documentation - Read the Docs

You should give a try to PPO2, A2C and its successors (ACKTR, ACER). ... As some datasets will sometimes fill missing values with...

Understanding the Effects of Second-Order Approximations in ...

parameters, directly computing the inverse of the Fisher- information matrix is intractable ... [3], ACKTR [19] used the natural gradient to further improve....