NaN during training
See original GitHub issueHi,
I fiddled around with the jax
code a bit and noticed that for small systems where any spin has only one electron the network will throw nan
after some time.
ferminet --config ferminet/configs/atom.py --config.system.atom H --config.batch_size 4096 --config.pretrain.iterations 0
I0215 05:54:52.148167 139716596184896 train.py:461] Step 00538: -0.4999 E_h, pmove=0.97
I0215 05:54:52.173480 139716596184896 train.py:461] Step 00539: -0.4999 E_h, pmove=0.97
I0215 05:54:52.199377 139716596184896 train.py:461] Step 00540: nan E_h, pmove=0.97
I0215 05:54:52.224862 139716596184896 train.py:461] Step 00541: nan E_h, pmove=0.00
I0215 05:54:52.250287 139716596184896 train.py:461] Step 00542: nan E_h, pmove=0.00
I traced the issue down and found that this happens at the log abs determinant of the Slater determinant (in this case a 1x1 matrix). There is a small probability for a sample to be chosen such that the (1x1) matrix is exactly 0. After that, the code just produces nan
.
Issue Analytics
- State:
- Created 3 years ago
- Comments:8 (3 by maintainers)
Top Results From Across the Web
Common causes of nans during training of neural networks
6 Answers 6 · Gradient blow up · Bad learning rate policy and params · Faulty Loss function · Faulty input · stride...
Read more >Common Causes of NANs During Training
Common Causes of NANs During Training · Gradient blow up · Bad learning rate policy and params · Faulty Loss function · Faulty...
Read more >Common causes of nans during training - Intellipaat Community
There can be many causes for NAN S to occur during training, below are a few causes which I know: Gradient blow up....
Read more >Why do l get NaN values when l train my neural network with a ...
During training, it may happen that neurons of a particular layer may always become influenced only by the output of a particular neuron...
Read more >NAN value appears during training #65 - lululxvi/deepxde
Now, It's working fine except one place of loss value. During training steps, it has a huge value for boundary loss. The third...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
full_det=True
does not mean that spin is ignored, and does not make things fully antisymmetric wrt permutation of electrons of different spin. It just means that instead of there being N_alpha non-zero orbitals for alpha electrons and N_beta non-zero orbitals for beta electrons, there are now N=N_alpha+N_beta nonzero orbitals for both alpha and beta electrons (but the orbitals can be different!). This generalizes thefull_det=False
case. It seems like it helps on some systems, though the difference is not enormous.On Fri, Mar 19, 2021 at 10:33 AM Nicholas Gao @.***> wrote:
Fixed in #23 .