question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. ItĀ collects links to all the places you might be looking at while hunting down a tough bug.

And, if youā€™re still stuck at the end, weā€™re happy to hop on a call to see how we can help out.

[Bug] Silent NaNs in PPO Loss Calculation if n_steps=1 and n_envs=1

See original GitHub issue

šŸ› Bug

This is somewhere between a bug and a request for more informative errors:

When n_steps and n_envs are both set to 1, the batch returned by a rollout buffer here will be of length 1. This will make the advantages calculation return nan values, since the normalization step involves calculating the standard deviation of advantages, which is undefined for a single element.

I recognize that this small of a setting is definitely an edge case (I ran into it during testing, when we were setting all values quite low for speed reasons), so Iā€™m not sure it makes sense to have logic for this case, but at minimum, I think it would be beneficial to have some kind of explicit warning that checks if actions or advantages have a single element, so that thereā€™s a clear indication of the source of the issue, rather than having to follow a breadcrumb trail of nans from some higher abstraction level of code

To Reproduce

from stable_baselines3 import PPO

env = gym.make('CartPole-v1')
model = PPO('MlpPolicy', env, verbose=1, n_steps=1)
model.learn(total_timesteps=10)

This will fail with an unclear error :

RuntimeError: invalid multinomial distribution (encountering probability entry < 0)

If you put a debugger or insert logging statements at ppo.py:170, youā€™ll be able to see that (1) len(advantages) = 1 and consequently (2) advantages.std() = nan, which first arises as a visible bug when you try to collect an on-policy rollout after your first training step, since the nan values in loss propagate into nan parameter values.

Expected behavior

Either (1) explicit support for training on effective batches of size 1, or (2) a clearer and earlier error when you attempt to construct an algorithm object with n_steps=1 and n_envs=1, informing the user that the case isnā€™t supported.

###Ā System Info

Describe the characteristic of your environment:

  • Describe how the library was installed (pip, docker, source, ā€¦): Cloned from fork of current master, installed via pip
  • GPU models and configuration: N/A
  • Python version: 3.7.0
  • PyTorch version: 1.7.1
  • Gym version: 0.17.3
  • Versions of any other relevant libraries: N/A

Additional context

Add any other context about the problem here.

Checklist

  • I have checked that there is no similar issue in the repo (required)
  • I have read the documentation (required)
  • I have provided a minimal working example to reproduce the bug (required)

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:3
  • Comments:13 (10 by maintainers)

github_iconTop GitHub Comments

1reaction
decodyngcommented, Dec 21, 2020

Ah, reading over your comment again, I now think weā€™re saying the same thing here, except youā€™re framing it as the last minibatch getting truncated, and in the situation Iā€™m describing, you canā€™t pull even a single minibatch from the amount of data present in n_steps*n_envs, so all batches are truncated

0reactions
hughperkinscommented, Aug 25, 2022

the issue with n_env * n_step == 1 should be fixed now by https://github.com/DLR-RM/stable-baselines3/pull/1028

Read more comments on GitHub >

github_iconTop Results From Across the Web

Dealing with NaNs and infs - Stable Baselines - Read the Docs
During the training of a model on a given environment, it is possible that the RL model becomes completely corrupted when a NaN...
Read more >
Proximal Policy Optimization Tutorial (Part 2/2: GAE and PPO ...
The PPO loss can be calculated as follows. PPO uses a ratio between the newly updated policy and old policy in the update...
Read more >
Deep-Learning Nan loss reasons - python - Stack Overflow
Too high of a learning rate. You can often tell if this is the case if the loss begins to increase and then...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found