Matrix not invertible and -INF in NCC
See original GitHub issueHi there,
I’m experimenting with training the network. I tried the NCC loss function, with default hyperparams (window of 9, eps of 1e-5).
Randomly during training, the loss goes to -inf, and the execution stops with InvalidInputArgument and saying that the matrix is not invertible.
It does happen randomly, so it is difficult for me to reproduce the code. I tried a couple of things, like checking the input data, just running train_on_batch with that specific batch where it fails, try to check the output after the error was thrown as so on.
The problem is that I cannot really figure out there in the code an inf is generated.
cc = cross * cross / (I_var * J_var + self.eps)
There is a eps on the denominator of the loss function, so it cannot be a division by zero. It must be from the numerator, but then that has to be inf. Given that cross is computed via
cross = IJ_sum - u_J*I_sum - u_I*J_sum + u_I*u_J*win_size
one of the term has to be inf. There is where I got stuck cuz, given that these terms are either convs activation maps or hyperparams, there is no way these can generate inf.
Anyone with the same problem? any clue why?
Issue Analytics
- State:
- Created 3 years ago
- Comments:16
Top GitHub Comments
I had the same issue as I try to reimplement the VoxelMorph. I think the reason for overflowed ncc is the lack of variation in some windows. The normalized corss-correlation is essentially normalizing the data by divided its variation. For two windows with very low variation to calculated ncc, both the denominator and numerator are approaching zero which brings the numerical stability issue. There are more chances to have this issue in the artificial images as they have much less noise. You can try to calculate the ncc of two identical images with flat region to see if the algorithm if is numerical stable.
My solution is
Here a few changes to the original algorithm
@HansLeonardVanBrueggemann thanks for following through!
However, that is the modified implementation, right? The existing implementation of the loss at https://github.com/voxelmorph/voxelmorph/blob/master/src/losses.py does not have the square root, so any idea why this might still be failing? Of course, we wouldn’t want negatives in there anyway. I beleive I remember reading about this numerical instability in tensorflow before, but just vaguely.