Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

double backward always returns nan when dtype is float16 and cudnn is enabled.

See original GitHub issue

When a pair of F.reshape and F.batch_normalization is used under the condition that dtype is fload16 and use_cudnn=‘always’, double backward of the pair goes so unstable that it returns nan with high probability.

One use-case of the pair is F.group_normalization: https://github.com/chainer/chainer/blob/afe903389d822583a5355e9d46e6766d048ebeb5/chainer/functions/normalization/group_normalization.py#L61-L72

Conditions

Platform: Linux-4.4.0-98-generic-x86_64-with-debian-stretch-sid
Chainer: 6.0.0b2
NumPy: 1.15.4
CuPy:
  CuPy Version          : 6.0.0b2
  CUDA Root             : /usr/local/cuda
  CUDA Build Version    : 9020
  CUDA Driver Version   : 9020
  CUDA Runtime Version  : 9020
  cuDNN Build Version   : 7201
  cuDNN Version         : 7201
  NCCL Build Version    : None
iDeep: 2.0.0.post3

Code to reproduce

import cupy as cp
from chainer import gradient_check
import chainer.functions as F
import numpy


def reshape_and_bn(x, gamma, beta):
    x_shape = x.shape
    expander = [None, Ellipsis, None, None]
    x_ = F.reshape(x, (1, x_shape[0] * x_shape[1], -1, 1))
    dummy_g = cp.ones(x_.shape[1], dtype=x_.dtype)
    dummy_b = cp.zeros(x_.shape[1], dtype=x_.dtype)
    x_normalized = F.batch_normalization(x_, dummy_g, dummy_b)
    x_normalized = F.reshape(x_normalized, x_shape)
    gamma = gamma[expander]
    beta = beta[expander]
    return x_normalized * gamma + beta


def run():
    x = cp.random.uniform(-1, 1, (5, 3, 4, 4)).astype(cp.float16)
    gy = cp.random.uniform(-1, 1, (5, 3, 4, 4)).astype(cp.float16)
    ggx = cp.random.uniform(-1, 1, (5, 3, 4, 4)).astype(cp.float16)
    gamma = cp.random.uniform(-1, 1, 3).astype(cp.float16)
    beta = cp.random.uniform(-1, 1, 3).astype(cp.float16)
    ggamma = cp.random.uniform(-1, 1, 3).astype(cp.float16)
    gbeta = cp.random.uniform(-1, 1, 3).astype(cp.float16)

    print('Backward')
    gradient_check.check_backward(
        reshape_and_bn, (x, gamma, beta,), (gy,), dtype=numpy.float64,
        atol=1e-2, rtol=1e-3
    )

    print('Double Backward')
    gradient_check.check_double_backward(
        reshape_and_bn, (x, gamma, beta), (gy,),
        (ggx, ggamma, gbeta), dtype=numpy.float64,
        atol=1e-2, rtol=1e-3
    )


if __name__ == '__main__':
    run()

Error messages, stack traces, or logs

For backward,

gradients (numeric):  0.6409951020032167
gradients (backward): -0.5768083848859537


Not equal to tolerance rtol=0.001, atol=0.01

Mismatch: 100%
Max absolute difference: 1.21780349
Max relative difference: 2.1112791
 x: array(0.640995)
 y: array(-0.576808)

assert_allclose failed:
  shape: () ()
  dtype: float64 float64
  i: (0,)
  x[i]: 0.6409951020032167
  y[i]: -0.5768083848859537
  relative error[i]: 2.1112790985691965
  absolute error[i]: 1.2178034868891703
x: 0.6409951
y: -0.57680838

For double backward,

gradients (numeric):  1.773324329406023
gradients (backward): nan


Not equal to tolerance rtol=0.001, atol=0.01

x and y nan location mismatch:
 x: array(1.773324)
 y: array(nan)

assert_allclose failed:
  shape: () ()
  dtype: float64 float64
  i: (0,)
  x[i]: 1.773324329406023
  y[i]: nan
  relative error[i]: nan
  absolute error[i]: nan
x: 1.77332433
y: nan