Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Calculation of scale term

See original GitHub issue

Hi guys,

In the paper in Table 1, you specify that the NN learns the log of the scale, and thus the scale is calculated as s = exp(log s). However, in your code, the scale is calculated by scale = tf.nn.sigmoid(h[:, :, :, 1::2] + 2.). Would it be possible to elaborate on why this calculation was used instead? I’m assuming it’s for reasons of numerical stability?

Issue Analytics

State:
Created 5 years ago
Reactions:2
Comments:8

Top GitHub Comments

3reactions

wanglouis49commented, Mar 2, 2019

Note that y = scale * x + shift and scale = tf.nn.relu(h[:, :, :, 1::2]), shift = h[:, :, :, ::2]. I agree that dy/dh is bounded here due to relu, but dy/dx = scale which corresponds the entries of the Jacobian matrix is still unbounded. If this is the case, the determinant of Jacobian can be huge, suggesting a dramatic volume grow from x to y. This makes the training unstable in my experiments. The idea of using sigmoid is to bound the dy/dx from above - in fact it only allows the volume to shrink. I think it sacrifices the capacity for stability. However, I don’t have any further intuition other than these observations. Guess it should be something to overcome.

3reactions

wanglouis49commented, Feb 28, 2019

So I asked the authors at NeurIPS last year - using sigmoid here is to bound the gradients of the affine coupling layer. In the previous Real-NVP work a tanh is used for the same reason. I’ve tried training without this kind of bounding and it didn’t converge.