Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Bug in bilinear sampler of transformer

See original GitHub issue

Hi, I tried the spatial transformer with a simple toy image and identity transform to verify that it works correctly, but I think that it has a bug, unless I’m doing something wrong.

This is the input image I used: original_toy_example

When I transform it with theta=[[1,0,0,0,1,0]], I get this as an output, which is clearly not identical to the input: identity_transform

I fixed it by adjusting the grid range in the bilinear sampler by -1 and shifting the calculation of the deltas before the clipping operation. Here is the fixed code:

def bilinear_sampler(img, x, y):
        """
        Performs bilinear sampling of the input images according to the 
        normalized coordinates provided by the sampling grid. Note that 
        the sampling is done identically for each channel of the input.
        To test if the function works properly, output image should be
        identical to input image when theta is initialized to identity
        transform.
        Input
        -----
        - img: batch of images in (B, H, W, C) layout.
        - grid: x, y which is the output of affine_grid_generator.
        Returns
        -------
        - interpolated images according to grids. Same size as grid.
        """
        # prepare useful params
        B = tf.shape(img)[0]
        H = tf.shape(img)[1]
        W = tf.shape(img)[2]
        C = tf.shape(img)[3]

        max_y = tf.cast(H - 1, 'int32')
        max_x = tf.cast(W - 1, 'int32')
        zero = tf.zeros([], dtype='int32')

        # cast indices as float32 (for rescaling)
        x = tf.cast(x, 'float32')
        y = tf.cast(y, 'float32')

        # rescale x and y to [0, W-1/H-1]
        x = 0.5 * ((x + 1.0) * tf.cast(max_x, 'float32'))
        y = 0.5 * ((y + 1.0) * tf.cast(max_y, 'float32'))

        # grab 4 nearest corner points for each (x_i, y_i)
        # i.e. we need a rectangle around the point of interest
        x0 = tf.floor(x)
        x1 = x0 + 1
        y0 = tf.floor(y)
        y1 = y0 + 1

        # calculate deltas
        wa = (x1-x) * (y1-y)
        wb = (x1-x) * (y-y0)
        wc = (x-x0) * (y1-y)
        wd = (x-x0) * (y-y0)

        #recast as int for index calculation
        x0 = tf.cast(x0, 'int32')
        x1 = tf.cast(x1, 'int32')
        y0 = tf.cast(y0, 'int32')
        y1 = tf.cast(y1, 'int32')

        # clip to range [0, H/W] to not violate img boundaries
        x0 = tf.clip_by_value(x0, zero, max_x)
        x1 = tf.clip_by_value(x1, zero, max_x)
        y0 = tf.clip_by_value(y0, zero, max_y)
        y1 = tf.clip_by_value(y1, zero, max_y)

        # get pixel value at corner coords
        Ia = get_pixel_value(img, x0, y0)
        Ib = get_pixel_value(img, x0, y1)
        Ic = get_pixel_value(img, x1, y0)
        Id = get_pixel_value(img, x1, y1)

        # add dimension for addition
        wa = tf.expand_dims(wa, axis=3)
        wb = tf.expand_dims(wb, axis=3)
        wc = tf.expand_dims(wc, axis=3)
        wd = tf.expand_dims(wd, axis=3)

        # compute output
        out = tf.add_n([wa*Ia, wb*Ib, wc*Ic, wd*Id])

        return out

With this code I recover the original image by transformation with the identity matrix.

Issue Analytics

State:
Created 5 years ago
Comments:13 (7 by maintainers)

Top GitHub Comments

1reaction

kevinzakkacommented, Jun 3, 2018

@marissaweis my bad, just revert max_h - 1 to max_h and max_w - 1 to max_w.

0reactions

kevinzakkacommented, Jul 9, 2018

@zachluo It really doesn’t matter to be honest. At the level of the network, it will just end up acting as a regularizer.

Top Results From Across the Web

Spatial Transformer Networks: how is theta differentiable?

bilinear sampling is considered to be differentiable. To understand it better, consider the following figure which illustrates the process ...

A Tensorflow implementation of Spatial Transformer Networks

bilinear sampler : takes as input the input feature map and the grid generated by the grid generator and produces the output feature...

Efficient Transformer with Dynamic Bilinear Low-Rank Attention

Many studies have been conducted to improve the efficiency of the Transformer from quadric to linear over long sequence conditions.

math - (Differentiable Image Sampling) Custom Integer ...

I guess thinking about how I'd use numerical methods to implement the integer kernel for the sampling function is another challenge for myself, ......

Spatial Transformer Networks Tutorial, Part 2 — Bilinear ...

The word adaptive indicates, that for each sample an appropriate transformation is produced, conditional on the input itself. Spatial ...