Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Pretraining gives NaN loses [bug]

See original GitHub issue

Describe the bug I tried to pretrain my DDPG model on my custom env. I have an agent that gives me pretrain trajectories. Those look fine upon inspection. However, the losses of the pretrain function are NaN.

Code example

generate_expert_traj(agent.act, 'expert_trace', env, n_timesteps=int(1e5), n_episodes=10)
model = DDPG('MlpPolicy', env, verbose=1, param_noise=param_noise, action_noise=action_noise,
             tensorboard_log="data/summaries/")
dataset = ExpertDataset(expert_path='expert_trace.npz', traj_limitation=1, batch_size=128)
model.pretrain(dataset, n_epochs=1000)

I was able to track it down to this line. https://github.com/hill-a/stable-baselines/blob/master/stable_baselines/common/base_class.py#L351

actions (3000, 3)
obs (3000, 12)
rewards (3000,)
episode_returns (10,)
episode_starts (3000,)
Total trajectories: 1
Total transitions: 598
Average returns: -2104.573738742611
Std for returns: 6.906356413688311
Pretraining with Behavior Cloning...
==== Training progress 10.00% ====
Epoch 100
Training loss: nan, Validation loss: nan

==== Training progress 20.00% ====
Epoch 200
Training loss: nan, Validation loss: nan

System Info Describe the characteristic of your environment:

stable-baselines 2.10.0 (via pip)
tensorflow 1.15.3
python 3.7
all cpu

Additional context I validated that it works properly with “MountainCarContinuos”, and i also validated my custom environment using the provided checker. Further, i validated in the debugger that the generate_expert_traj are all finite

all([np.all(np.isfinite(a)) for a in [observations,actions,rewards]])

Issue Analytics

State:
Created 3 years ago
Comments:6

Top GitHub Comments

1reaction

stheidcommented, Jun 17, 2020

I have solved it by bounding the actionspace of my environment much stronger.

env.action_space = gym.spaces.Box(low=np.full(3, -1e3), high=np.full(3, 1e3))

previously my actionspace was virtually unbounded (10^24). From a theoretical perspective it makes no sense to bound the action space in that way, however, such large control values are also highly unlikely.

I guess the difference between the sampled actions and the expert actions where to high?

I think your environment checker should include some validation of that kind for continuos action spaces. i am quite sure it will not work if the actionspace is unbounded (-inf,inf).

Feel free to close the issue, however, i think its worth considiering the findings for the documentation or the envchecker.

0reactions

stheidcommented, Jun 17, 2020

you are right, but it is really easy to overlook. Thanks for the help!