compatibility issue (training error/accuracy) with tensorflow 1.5
See original GitHub issueEverything works fine with tensorflow 1.4.
For newly released tensorflow 1.5, the training loss became NaN at 80 iterations:
iter: 80 / 84000, total loss: nan
>>> rpn_loss_cls: 0.684417
>>> rpn_loss_box: 0.019884
>>> loss_cls: 0.101328
>>> loss_box: 0.000000
>>> lr: 0.001000
Issue Analytics
- State:
- Created 6 years ago
- Comments:5
Top Results From Across the Web
TensorFlow version compatibility
This document is for users who need backwards compatibility across different versions of TensorFlow (either for code or data), ...
Read more >TensorFlow: Dramatic loss of accuracy after freezing graph?
Potential source of problem: I first trained my model using TF 0.12, but I believe it is compatible with Tf 1.01, the version...
Read more >tensorflow-metal | Apple Developer Forums
But still facing the GPU problem when training a 3D Unet. Here's part of my code and hoping to receive some suggestion to...
Read more >Train With Mixed Precision - NVIDIA Documentation Center
The network accuracy was achieved from training in FP32. ... list of supported drivers, see the CUDA Application Compatibility topic.
Read more >How to correctly install Keras and Tensorflow - ActiveState
Click to install Keras and Tensorflow together using pip. Understand how to use these Python libraries for machine learning use cases.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@jwnsu I’m using AdamOptimizer. The NaN problem goes away when I use cpu version of tensorflow 1.5. This makes me think it may be a bug of new cuda9 / cudnn7.
It seems tensorflow 1.5 affects training in subtle way. Regarding the error, it goes away after minor adjustment (e.g. learning rate.)