question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Bug in bilinear sampler of transformer

See original GitHub issue

Hi, I tried the spatial transformer with a simple toy image and identity transform to verify that it works correctly, but I think that it has a bug, unless I’m doing something wrong.

This is the input image I used: original_toy_example

When I transform it with theta=[[1,0,0,0,1,0]], I get this as an output, which is clearly not identical to the input: identity_transform

I fixed it by adjusting the grid range in the bilinear sampler by -1 and shifting the calculation of the deltas before the clipping operation. Here is the fixed code:

def bilinear_sampler(img, x, y):
        """
        Performs bilinear sampling of the input images according to the 
        normalized coordinates provided by the sampling grid. Note that 
        the sampling is done identically for each channel of the input.
        To test if the function works properly, output image should be
        identical to input image when theta is initialized to identity
        transform.
        Input
        -----
        - img: batch of images in (B, H, W, C) layout.
        - grid: x, y which is the output of affine_grid_generator.
        Returns
        -------
        - interpolated images according to grids. Same size as grid.
        """
        # prepare useful params
        B = tf.shape(img)[0]
        H = tf.shape(img)[1]
        W = tf.shape(img)[2]
        C = tf.shape(img)[3]

        max_y = tf.cast(H - 1, 'int32')
        max_x = tf.cast(W - 1, 'int32')
        zero = tf.zeros([], dtype='int32')

        # cast indices as float32 (for rescaling)
        x = tf.cast(x, 'float32')
        y = tf.cast(y, 'float32')

        # rescale x and y to [0, W-1/H-1]
        x = 0.5 * ((x + 1.0) * tf.cast(max_x, 'float32'))
        y = 0.5 * ((y + 1.0) * tf.cast(max_y, 'float32'))

        # grab 4 nearest corner points for each (x_i, y_i)
        # i.e. we need a rectangle around the point of interest
        x0 = tf.floor(x)
        x1 = x0 + 1
        y0 = tf.floor(y)
        y1 = y0 + 1

        # calculate deltas
        wa = (x1-x) * (y1-y)
        wb = (x1-x) * (y-y0)
        wc = (x-x0) * (y1-y)
        wd = (x-x0) * (y-y0)

        #recast as int for index calculation
        x0 = tf.cast(x0, 'int32')
        x1 = tf.cast(x1, 'int32')
        y0 = tf.cast(y0, 'int32')
        y1 = tf.cast(y1, 'int32')

        # clip to range [0, H/W] to not violate img boundaries
        x0 = tf.clip_by_value(x0, zero, max_x)
        x1 = tf.clip_by_value(x1, zero, max_x)
        y0 = tf.clip_by_value(y0, zero, max_y)
        y1 = tf.clip_by_value(y1, zero, max_y)

        # get pixel value at corner coords
        Ia = get_pixel_value(img, x0, y0)
        Ib = get_pixel_value(img, x0, y1)
        Ic = get_pixel_value(img, x1, y0)
        Id = get_pixel_value(img, x1, y1)

        # add dimension for addition
        wa = tf.expand_dims(wa, axis=3)
        wb = tf.expand_dims(wb, axis=3)
        wc = tf.expand_dims(wc, axis=3)
        wd = tf.expand_dims(wd, axis=3)

        # compute output
        out = tf.add_n([wa*Ia, wb*Ib, wc*Ic, wd*Id])

        return out

With this code I recover the original image by transformation with the identity matrix.

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:13 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
kevinzakkacommented, Jun 3, 2018

@marissaweis my bad, just revert max_h - 1 to max_h and max_w - 1 to max_w.

0reactions
kevinzakkacommented, Jul 9, 2018

@zachluo It really doesn’t matter to be honest. At the level of the network, it will just end up acting as a regularizer.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Spatial Transformer Networks: how is theta differentiable?
bilinear sampling is considered to be differentiable. To understand it better, consider the following figure which illustrates the process ...
Read more >
A Tensorflow implementation of Spatial Transformer Networks
bilinear sampler : takes as input the input feature map and the grid generated by the grid generator and produces the output feature...
Read more >
Efficient Transformer with Dynamic Bilinear Low-Rank Attention
Many studies have been conducted to improve the efficiency of the Transformer from quadric to linear over long sequence conditions.
Read more >
math - (Differentiable Image Sampling) Custom Integer ...
I guess thinking about how I'd use numerical methods to implement the integer kernel for the sampling function is another challenge for myself, ......
Read more >
Spatial Transformer Networks Tutorial, Part 2 — Bilinear ...
The word adaptive indicates, that for each sample an appropriate transformation is produced, conditional on the input itself. Spatial ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found