Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

reward during the training process remains .nan in all iterations

See original GitHub issue

System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 18.04
Ray installed from (source or binary): pip version
Ray version: 0.5.3
Python version: Python 3.6.6

Describe the problem

I am trying to train a very basic DQN Agent using the Python API code in the documentation as follows:

import gym, os, ray
import ray.rllib.agents.dqn as dqn
from ray.tune.logger import pretty_print

ray.init()
config = dqn.DEFAULT_CONFIG.copy()
env = gym.make("myenv-v02")
agent = dqn.DQNAgent(config=config, env="myenv-v02")

for i in range(1000):
   result = agent.train()
   print(pretty_print(result))

   if i % 100 == 0:
       checkpoint = agent.save()
       print("checkpoint saved at", checkpoint)

The problem is that in all of the iteration I get:

episode_len_mean: .nan
episode_reward_max: .nan
episode_reward_mean: .nan
episode_reward_min: .nan
episodes: 0

Reward does not change, it remains nan from beginning to the end.

I test my environment in the following test:

import gym

if __name__ == "__main__":

    env = gym.make('myenv-v02')
    env.reset()
    for i_episode in range(2):
        for t in range(2):
            action = env.action_space.sample()
            observation, reward, done, info = env.step(action)
            print(observation)
            print(reward)

And I get properly changing values for observation (state) and rewards.

What am I doing wrong with the ray code? Why is my reward nan?