Bug in bilinear sampler of transformer
See original GitHub issueHi, I tried the spatial transformer with a simple toy image and identity transform to verify that it works correctly, but I think that it has a bug, unless I’m doing something wrong.
This is the input image I used:
When I transform it with theta=[[1,0,0,0,1,0]], I get this as an output, which is clearly not identical to the input:
I fixed it by adjusting the grid range in the bilinear sampler by -1 and shifting the calculation of the deltas before the clipping operation. Here is the fixed code:
def bilinear_sampler(img, x, y):
"""
Performs bilinear sampling of the input images according to the
normalized coordinates provided by the sampling grid. Note that
the sampling is done identically for each channel of the input.
To test if the function works properly, output image should be
identical to input image when theta is initialized to identity
transform.
Input
-----
- img: batch of images in (B, H, W, C) layout.
- grid: x, y which is the output of affine_grid_generator.
Returns
-------
- interpolated images according to grids. Same size as grid.
"""
# prepare useful params
B = tf.shape(img)[0]
H = tf.shape(img)[1]
W = tf.shape(img)[2]
C = tf.shape(img)[3]
max_y = tf.cast(H - 1, 'int32')
max_x = tf.cast(W - 1, 'int32')
zero = tf.zeros([], dtype='int32')
# cast indices as float32 (for rescaling)
x = tf.cast(x, 'float32')
y = tf.cast(y, 'float32')
# rescale x and y to [0, W-1/H-1]
x = 0.5 * ((x + 1.0) * tf.cast(max_x, 'float32'))
y = 0.5 * ((y + 1.0) * tf.cast(max_y, 'float32'))
# grab 4 nearest corner points for each (x_i, y_i)
# i.e. we need a rectangle around the point of interest
x0 = tf.floor(x)
x1 = x0 + 1
y0 = tf.floor(y)
y1 = y0 + 1
# calculate deltas
wa = (x1-x) * (y1-y)
wb = (x1-x) * (y-y0)
wc = (x-x0) * (y1-y)
wd = (x-x0) * (y-y0)
#recast as int for index calculation
x0 = tf.cast(x0, 'int32')
x1 = tf.cast(x1, 'int32')
y0 = tf.cast(y0, 'int32')
y1 = tf.cast(y1, 'int32')
# clip to range [0, H/W] to not violate img boundaries
x0 = tf.clip_by_value(x0, zero, max_x)
x1 = tf.clip_by_value(x1, zero, max_x)
y0 = tf.clip_by_value(y0, zero, max_y)
y1 = tf.clip_by_value(y1, zero, max_y)
# get pixel value at corner coords
Ia = get_pixel_value(img, x0, y0)
Ib = get_pixel_value(img, x0, y1)
Ic = get_pixel_value(img, x1, y0)
Id = get_pixel_value(img, x1, y1)
# add dimension for addition
wa = tf.expand_dims(wa, axis=3)
wb = tf.expand_dims(wb, axis=3)
wc = tf.expand_dims(wc, axis=3)
wd = tf.expand_dims(wd, axis=3)
# compute output
out = tf.add_n([wa*Ia, wb*Ib, wc*Ic, wd*Id])
return out
With this code I recover the original image by transformation with the identity matrix.
Issue Analytics
- State:
- Created 5 years ago
- Comments:13 (7 by maintainers)
Top Results From Across the Web
Spatial Transformer Networks: how is theta differentiable?
bilinear sampling is considered to be differentiable. To understand it better, consider the following figure which illustrates the process ...
Read more >A Tensorflow implementation of Spatial Transformer Networks
bilinear sampler : takes as input the input feature map and the grid generated by the grid generator and produces the output feature...
Read more >Efficient Transformer with Dynamic Bilinear Low-Rank Attention
Many studies have been conducted to improve the efficiency of the Transformer from quadric to linear over long sequence conditions.
Read more >math - (Differentiable Image Sampling) Custom Integer ...
I guess thinking about how I'd use numerical methods to implement the integer kernel for the sampling function is another challenge for myself, ......
Read more >Spatial Transformer Networks Tutorial, Part 2 — Bilinear ...
The word adaptive indicates, that for each sample an appropriate transformation is produced, conditional on the input itself. Spatial ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@marissaweis my bad, just revert
max_h - 1
tomax_h
andmax_w - 1
tomax_w
.@zachluo It really doesn’t matter to be honest. At the level of the network, it will just end up acting as a regularizer.