Pretraining gives NaN loses [bug]
See original GitHub issueDescribe the bug I tried to pretrain my DDPG model on my custom env. I have an agent that gives me pretrain trajectories. Those look fine upon inspection. However, the losses of the pretrain function are NaN.
Code example
generate_expert_traj(agent.act, 'expert_trace', env, n_timesteps=int(1e5), n_episodes=10)
model = DDPG('MlpPolicy', env, verbose=1, param_noise=param_noise, action_noise=action_noise,
tensorboard_log="data/summaries/")
dataset = ExpertDataset(expert_path='expert_trace.npz', traj_limitation=1, batch_size=128)
model.pretrain(dataset, n_epochs=1000)
I was able to track it down to this line. https://github.com/hill-a/stable-baselines/blob/master/stable_baselines/common/base_class.py#L351
actions (3000, 3)
obs (3000, 12)
rewards (3000,)
episode_returns (10,)
episode_starts (3000,)
Total trajectories: 1
Total transitions: 598
Average returns: -2104.573738742611
Std for returns: 6.906356413688311
Pretraining with Behavior Cloning...
==== Training progress 10.00% ====
Epoch 100
Training loss: nan, Validation loss: nan
==== Training progress 20.00% ====
Epoch 200
Training loss: nan, Validation loss: nan
System Info Describe the characteristic of your environment:
- stable-baselines 2.10.0 (via pip)
- tensorflow 1.15.3
- python 3.7
- all cpu
Additional context I validated that it works properly with “MountainCarContinuos”, and i also validated my custom environment using the provided checker. Further, i validated in the debugger that the generate_expert_traj are all finite
all([np.all(np.isfinite(a)) for a in [observations,actions,rewards]])
Issue Analytics
- State:
- Created 3 years ago
- Comments:6
Top Results From Across the Web
Loss nan when resuming from a pretrained model - vision
I got the error message as follows: (pytorch) shu@hec02:~/GANs/Image Classification/ImageNet$ python main_official.py /nfs/home/ ...
Read more >Wav2Vec2: How to correct for nan in training and validation loss
My issue is that that the training loss and validation loss show either nan or inf, and the WER does not decrease.
Read more >Loss becomes NaN in training - Stack Overflow
If your loss is NaN that usually means that your gradients are vanishing/exploding. You could check your gradients.
Read more >Nan values appears while training Yolov4 using resnet 18 ...
I am trying to train Yolov4 using resnet 18 pretrained model on a dataset ... Your attached training log as above does not...
Read more >HuggingFace Transformers is giving loss: nan - accuracy
The full error trace looks like this, you can see the accuracy and loss on each epoch and this model is obviously not...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I have solved it by bounding the actionspace of my environment much stronger.
previously my actionspace was virtually unbounded (10^24). From a theoretical perspective it makes no sense to bound the action space in that way, however, such large control values are also highly unlikely.
I guess the difference between the sampled actions and the expert actions where to high?
I think your environment checker should include some validation of that kind for continuos action spaces. i am quite sure it will not work if the actionspace is unbounded (-inf,inf).
Feel free to close the issue, however, i think its worth considiering the findings for the documentation or the envchecker.
you are right, but it is really easy to overlook. Thanks for the help!