Calculation of scale term
See original GitHub issueHi guys,
In the paper in Table 1, you specify that the NN learns the log of the scale, and thus the scale is calculated as s = exp(log s)
. However, in your code, the scale is calculated by scale = tf.nn.sigmoid(h[:, :, :, 1::2] + 2.)
. Would it be possible to elaborate on why this calculation was used instead? I’m assuming it’s for reasons of numerical stability?
Issue Analytics
- State:
- Created 5 years ago
- Reactions:2
- Comments:8
Top Results From Across the Web
Scale Factor | Definition, Formula & How To Find - Tutors
A scale factor in math is the ratio between corresponding measurements of an object and a representation of that object. If the scale...
Read more >Scale - Meaning, Formula, Examples - Cuemath
Scale. Scale is a ratio that represents the relationship between the dimensions of a model and the corresponding dimensions on the actual figure...
Read more >What Is a Scale Factor? - Video & Lesson Transcript | Study.com
Scale factor is the ratio of corresponding sides on two similar figures. In math, scale factor is used to determine how many times...
Read more >Understanding Economies of Scale (+Why Finding That ...
Essentially, economies of scale is a fancy phrase meaning that a ... To calculate economies of scale, divide the percentage change in cost...
Read more >Scale Terms & Definitions
A device used to determine weight. Weighing scales can be divided into two primary types: spring scales and balances. Spring scales measure weight...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Note that
y = scale * x + shift
andscale = tf.nn.relu(h[:, :, :, 1::2]), shift = h[:, :, :, ::2]
. I agree thatdy/dh
is bounded here due torelu
, butdy/dx = scale
which corresponds the entries of the Jacobian matrix is still unbounded. If this is the case, the determinant of Jacobian can be huge, suggesting a dramatic volume grow from x to y. This makes the training unstable in my experiments. The idea of usingsigmoid
is to bound thedy/dx
from above - in fact it only allows the volume to shrink. I think it sacrifices the capacity for stability. However, I don’t have any further intuition other than these observations. Guess it should be something to overcome.So I asked the authors at NeurIPS last year - using
sigmoid
here is to bound the gradients of the affine coupling layer. In the previous Real-NVP work atanh
is used for the same reason. I’ve tried training without this kind of bounding and it didn’t converge.