Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[bug report] NStepLSTM causes error only when n_layers>1 device>0, dropout!=0, and config.train==True

See original GitHub issue

Descrition

NStepLSTM raises cudaErrorIllegalAddress when n_layers>1 and device>0 with multi-gpu environment.

This phenomenon is not observed when n_layers=1 or device=0.

Environment

>>> chainer.print_runtime_info()
Platform: Linux-4.4.0-135-generic-x86_64-with-debian-stretch-sid
Chainer: 5.2.0
NumPy: 1.16.0
CuPy:
  CuPy Version          : 5.2.0
  CUDA Root             : /home/hashimoto/.local/cuda/cuda-9.2
  CUDA Build Version    : 9020
  CUDA Driver Version   : 10000
  CUDA Runtime Version  : 9020
  cuDNN Build Version   : 7301
  cuDNN Version         : 7301
  NCCL Build Version    : 2307
iDeep: Not Available

$ nvidia-smi
Sun Feb 17 16:40:55 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.48                 Driver Version: 410.48                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 108...  On   | 00000000:02:00.0  On |                  N/A |
| 29%   33C    P8    15W / 250W |    218MiB / 11178MiB |     21%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 108...  On   | 00000000:81:00.0 Off |                  N/A |
| 29%   36C    P8     8W / 250W |      2MiB / 11178MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1929      G   /usr/lib/xorg/Xorg                           130MiB |
|    0     20566      G   compiz                                        86MiB |
+-----------------------------------------------------------------------------+

Code to reproduce

import numpy as np

import chainer
from chainer import functions as F, links as L
from chainer.iterators import SerialIterator
from chainer.optimizers import Adam
from chainer.training.updaters import StandardUpdater
from chainer.training import Trainer

N_LAYERS = 2
DEVICE = 1

IN_N_UNITS = 1
OUT_N_UNITS = 1


class MyModel(chainer.Chain):
    def __init__(self):
        super().__init__()
        with self.init_scope():
            self.n_step_lstm = L.NStepLSTM(N_LAYERS, IN_N_UNITS, OUT_N_UNITS, 0.8)

    def __call__(self, xs, ys):
        _, _, ys_predicted = self.n_step_lstm(None, None, xs)
        loss = 0
        for i in range(len(ys)):
            loss += F.mean_squared_error(ys_predicted[i], ys[i])
        return loss


class MyDataset(chainer.dataset.DatasetMixin):
    N_SAMPLES = 100
    NOISE_LEVEL = 0.1
    DATA_LEN_MAX = 10
    DATA_LEN_MIN = 5

    def __init__(self):
        self.data = np.sin(np.arange(self.N_SAMPLES, dtype=np.float32) * 0.1) \
                    + self.NOISE_LEVEL * np.random.randn(100).astype(np.float32)

    def __len__(self):
        return self.N_SAMPLES - (self.DATA_LEN_MAX + 1)

    def get_example(self, i):
        data_len = i % (self.DATA_LEN_MAX - self.DATA_LEN_MIN) + self.DATA_LEN_MIN
        return np.expand_dims(self.data[i:i + data_len], 1), np.expand_dims(self.data[i + 1:i + data_len + 1], 1)


def convert(batch, device):
    def to_device_batch(batch):
        if device is None:
            return batch
        elif device < 0:
            return [chainer.dataset.to_device(device, x) for x in batch]
        else:
            xp = chainer.cuda.cupy.get_array_module(*batch)
            concat = xp.concatenate(batch, axis=0)
            sections = np.cumsum([len(x)
                                  for x in batch[:-1]], dtype=np.int32)
            concat_dev = chainer.dataset.to_device(device, concat)
            batch_dev = chainer.cuda.cupy.split(concat_dev, sections)
            return batch_dev

    return {'xs': to_device_batch([x for x, _ in batch]),
            'ys': to_device_batch([y for _, y in batch])}


dataset = MyDataset()
iterator = SerialIterator(dataset, batch_size=16)
model = MyModel()
optimizer = Adam()
optimizer.setup(model)

updater = StandardUpdater(iterator, optimizer, converter=convert, device=DEVICE)
trainer = Trainer(updater, (10, "iteration"), "result")

trainer.run()

Exception in main training loop: cudaErrorIllegalAddress: an illegal memory access was encountered
Traceback (most recent call last):
  File "/home/hashimoto/python372/lib/python3.7/site-packages/chainer/training/trainer.py", line 315, in run
    update()
  File "/home/hashimoto/python372/lib/python3.7/site-packages/chainer/training/updaters/standard_updater.py", line 165, in update
    self.update_core()
  File "/home/hashimoto/python372/lib/python3.7/site-packages/chainer/training/updaters/standard_updater.py", line 179, in update_core
    optimizer.update(loss_func, **in_arrays)
  File "/home/hashimoto/python372/lib/python3.7/site-packages/chainer/optimizer.py", line 680, in update
    loss = lossfun(*args, **kwds)
  File "/tmp/pycharm_project_979/nsteplstm_test.py", line 24, in __call__
    _, _, ys_predicted = self.n_step_lstm(None, None, xs)
  File "/home/hashimoto/python372/lib/python3.7/site-packages/chainer/link.py", line 242, in __call__
    out = forward(*args, **kwargs)
  File "/home/hashimoto/python372/lib/python3.7/site-packages/chainer/links/connection/n_step_lstm.py", line 70, in forward
    (hy, cy), ys = self._call([hx, cx], xs, **kwargs)
  File "/home/hashimoto/python372/lib/python3.7/site-packages/chainer/links/connection/n_step_rnn.py", line 207, in _call
    for h in result[:-1]]
  File "/home/hashimoto/python372/lib/python3.7/site-packages/chainer/links/connection/n_step_rnn.py", line 207, in <listcomp>
    for h in result[:-1]]
  File "/home/hashimoto/python372/lib/python3.7/site-packages/chainer/functions/array/permutate.py", line 133, in permutate
    y, = Permutate(indices, axis, inv).apply((x,))
  File "/home/hashimoto/python372/lib/python3.7/site-packages/chainer/function_node.py", line 263, in apply
    outputs = self.forward(in_data)
  File "/home/hashimoto/python372/lib/python3.7/site-packages/chainer/functions/array/permutate.py", line 70, in forward
    return self._permutate(x, inds, self.inv),
  File "/home/hashimoto/python372/lib/python3.7/site-packages/chainer/functions/array/permutate.py", line 61, in _permutate
    return x[((slice(None),) * self.axis) + (indices,)]
  File "cupy/core/core.pyx", line 1625, in cupy.core.core.ndarray.__getitem__
  File "cupy/core/core.pyx", line 3134, in cupy.core.core._prepare_slice_list
  File "cupy/core/core.pyx", line 2397, in cupy.core.core.array
  File "cupy/core/core.pyx", line 2394, in cupy.core.core.array
  File "cupy/cuda/pinned_memory.pyx", line 212, in cupy.cuda.pinned_memory.alloc_pinned_memory
  File "cupy/cuda/pinned_memory.pyx", line 286, in cupy.cuda.pinned_memory.PinnedMemoryPool.malloc
  File "cupy/cuda/pinned_memory.pyx", line 306, in cupy.cuda.pinned_memory.PinnedMemoryPool.malloc
  File "cupy/cuda/pinned_memory.pyx", line 303, in cupy.cuda.pinned_memory.PinnedMemoryPool.malloc
  File "cupy/cuda/pinned_memory.pyx", line 177, in cupy.cuda.pinned_memory._malloc
  File "cupy/cuda/pinned_memory.pyx", line 178, in cupy.cuda.pinned_memory._malloc
  File "cupy/cuda/pinned_memory.pyx", line 29, in cupy.cuda.pinned_memory.PinnedMemory.__init__
  File "cupy/cuda/runtime.pyx", line 231, in cupy.cuda.runtime.hostAlloc
  File "cupy/cuda/runtime.pyx", line 137, in cupy.cuda.runtime.check_status
Will finalize trainer extensions and updater before reraising the exception.
Traceback (most recent call last):
  File "/tmp/pycharm_project_979/nsteplstm_test.py", line 77, in <module>
    trainer.run()
  File "/home/hashimoto/python372/lib/python3.7/site-packages/chainer/training/trainer.py", line 329, in run
    six.reraise(*sys.exc_info())
  File "/home/hashimoto/python372/lib/python3.7/site-packages/six.py", line 693, in reraise
    raise value
  File "/home/hashimoto/python372/lib/python3.7/site-packages/chainer/training/trainer.py", line 315, in run
    update()
  File "/home/hashimoto/python372/lib/python3.7/site-packages/chainer/training/updaters/standard_updater.py", line 165, in update
    self.update_core()
  File "/home/hashimoto/python372/lib/python3.7/site-packages/chainer/training/updaters/standard_updater.py", line 179, in update_core
    optimizer.update(loss_func, **in_arrays)
  File "/home/hashimoto/python372/lib/python3.7/site-packages/chainer/optimizer.py", line 680, in update
    loss = lossfun(*args, **kwds)
  File "/tmp/pycharm_project_979/nsteplstm_test.py", line 24, in __call__
    _, _, ys_predicted = self.n_step_lstm(None, None, xs)
  File "/home/hashimoto/python372/lib/python3.7/site-packages/chainer/link.py", line 242, in __call__
    out = forward(*args, **kwargs)
  File "/home/hashimoto/python372/lib/python3.7/site-packages/chainer/links/connection/n_step_lstm.py", line 70, in forward
    (hy, cy), ys = self._call([hx, cx], xs, **kwargs)
  File "/home/hashimoto/python372/lib/python3.7/site-packages/chainer/links/connection/n_step_rnn.py", line 207, in _call
    for h in result[:-1]]
  File "/home/hashimoto/python372/lib/python3.7/site-packages/chainer/links/connection/n_step_rnn.py", line 207, in <listcomp>
    for h in result[:-1]]
  File "/home/hashimoto/python372/lib/python3.7/site-packages/chainer/functions/array/permutate.py", line 133, in permutate
    y, = Permutate(indices, axis, inv).apply((x,))
  File "/home/hashimoto/python372/lib/python3.7/site-packages/chainer/function_node.py", line 263, in apply
    outputs = self.forward(in_data)
  File "/home/hashimoto/python372/lib/python3.7/site-packages/chainer/functions/array/permutate.py", line 70, in forward
    return self._permutate(x, inds, self.inv),
  File "/home/hashimoto/python372/lib/python3.7/site-packages/chainer/functions/array/permutate.py", line 61, in _permutate
    return x[((slice(None),) * self.axis) + (indices,)]
  File "cupy/core/core.pyx", line 1625, in cupy.core.core.ndarray.__getitem__
  File "cupy/core/core.pyx", line 3134, in cupy.core.core._prepare_slice_list
  File "cupy/core/core.pyx", line 2397, in cupy.core.core.array
  File "cupy/core/core.pyx", line 2394, in cupy.core.core.array
  File "cupy/cuda/pinned_memory.pyx", line 212, in cupy.cuda.pinned_memory.alloc_pinned_memory
  File "cupy/cuda/pinned_memory.pyx", line 286, in cupy.cuda.pinned_memory.PinnedMemoryPool.malloc
  File "cupy/cuda/pinned_memory.pyx", line 306, in cupy.cuda.pinned_memory.PinnedMemoryPool.malloc
  File "cupy/cuda/pinned_memory.pyx", line 303, in cupy.cuda.pinned_memory.PinnedMemoryPool.malloc
  File "cupy/cuda/pinned_memory.pyx", line 177, in cupy.cuda.pinned_memory._malloc
  File "cupy/cuda/pinned_memory.pyx", line 178, in cupy.cuda.pinned_memory._malloc
  File "cupy/cuda/pinned_memory.pyx", line 29, in cupy.cuda.pinned_memory.PinnedMemory.__init__
  File "cupy/cuda/runtime.pyx", line 231, in cupy.cuda.runtime.hostAlloc
  File "cupy/cuda/runtime.pyx", line 137, in cupy.cuda.runtime.check_status
cupy.cuda.runtime.CUDARuntimeError: cudaErrorIllegalAddress: an illegal memory access was encountered
Traceback (most recent call last):
  File "cupy/cuda/driver.pyx", line 192, in cupy.cuda.driver.moduleUnload
  File "cupy/cuda/driver.pyx", line 81, in cupy.cuda.driver.check_status
cupy.cuda.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
Exception ignored in: 'cupy.cuda.function.Module.__dealloc__'
Traceback (most recent call last):
  File "cupy/cuda/driver.pyx", line 192, in cupy.cuda.driver.moduleUnload
  File "cupy/cuda/driver.pyx", line 81, in cupy.cuda.driver.check_status
cupy.cuda.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered

Process finished with exit code 1

Note

This issue is originally reported in Chainer Slack (JP). The original error message was cupy.cuda.cudnn.CuDNNError: CUDNN_STATUS_INTERNAL_ERROR, but in my environment it was CUDA_ERROR_ILLEGAL_ADDRESS.