question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

reward during the training process remains .nan in all iterations

See original GitHub issue

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 18.04
  • Ray installed from (source or binary): pip version
  • Ray version: 0.5.3
  • Python version: Python 3.6.6

Describe the problem

I am trying to train a very basic DQN Agent using the Python API code in the documentation as follows:

import gym, os, ray
import ray.rllib.agents.dqn as dqn
from ray.tune.logger import pretty_print

ray.init()
config = dqn.DEFAULT_CONFIG.copy()
env = gym.make("myenv-v02")
agent = dqn.DQNAgent(config=config, env="myenv-v02")

for i in range(1000):
   result = agent.train()
   print(pretty_print(result))

   if i % 100 == 0:
       checkpoint = agent.save()
       print("checkpoint saved at", checkpoint)

The problem is that in all of the iteration I get:

episode_len_mean: .nan
episode_reward_max: .nan
episode_reward_mean: .nan
episode_reward_min: .nan
episodes: 0

Reward does not change, it remains nan from beginning to the end.

I test my environment in the following test:

import gym

if __name__ == "__main__":

    env = gym.make('myenv-v02')
    env.reset()
    for i_episode in range(2):
        for t in range(2):
            action = env.action_space.sample()
            observation, reward, done, info = env.step(action)
            print(observation)
            print(reward)

And I get properly changing values for observation (state) and rewards.

What am I doing wrong with the ray code? Why is my reward nan?

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Reactions:2
  • Comments:12 (6 by maintainers)

github_iconTop GitHub Comments

3reactions
ericlcommented, Oct 6, 2018

Is it possible your environment hasn’t terminated yet? Rewards are only reported once an episode completes.

1reaction
billyjasoncommented, Mar 6, 2021

setting horizon config does the job, you don’t need to change your environment if the episode does not have a terminal state

Read more comments on GitHub >

github_iconTop Results From Across the Web

No Reward Appearing for MARL Environment during Training
Hi, so I successfully got RLLIB to run on an underlying environment I specified for it. However, the reward and episode mean max...
Read more >
Why does my model start outputting nan values during training?
Reinforcement learning is tricky because of the reward function. Often, once the agent finds a reward signal for the first time, it'll "go...
Read more >
NaN loss when training regression network - Stack Overflow
I tried with a smaller model, i.e. with only one hidden layer, and same issue (it becomes nan at a different point). However,...
Read more >
How to Handle Missing Values in Cross Validation
After 5 iterations, each fold will be used in both training and testing. We need a practical way to handle missing values in...
Read more >
Reinforcement Learning 2: Markov Decision Processes
Content: Markov Chains - markov property - state transition matrix - definition and example Markov Reward Process - definition and example ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found