[bug report] NStepLSTM causes error only when n_layers>1 device>0, dropout!=0, and config.train==True
See original GitHub issueDescrition
NStepLSTM
raises cudaErrorIllegalAddress
when n_layers>1
and device>0
with multi-gpu environment.
This phenomenon is not observed when n_layers=1
or device=0
.
Environment
>>> chainer.print_runtime_info()
Platform: Linux-4.4.0-135-generic-x86_64-with-debian-stretch-sid
Chainer: 5.2.0
NumPy: 1.16.0
CuPy:
CuPy Version : 5.2.0
CUDA Root : /home/hashimoto/.local/cuda/cuda-9.2
CUDA Build Version : 9020
CUDA Driver Version : 10000
CUDA Runtime Version : 9020
cuDNN Build Version : 7301
cuDNN Version : 7301
NCCL Build Version : 2307
iDeep: Not Available
$ nvidia-smi
Sun Feb 17 16:40:55 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.48 Driver Version: 410.48 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 108... On | 00000000:02:00.0 On | N/A |
| 29% 33C P8 15W / 250W | 218MiB / 11178MiB | 21% Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce GTX 108... On | 00000000:81:00.0 Off | N/A |
| 29% 36C P8 8W / 250W | 2MiB / 11178MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1929 G /usr/lib/xorg/Xorg 130MiB |
| 0 20566 G compiz 86MiB |
+-----------------------------------------------------------------------------+
Code to reproduce
import numpy as np
import chainer
from chainer import functions as F, links as L
from chainer.iterators import SerialIterator
from chainer.optimizers import Adam
from chainer.training.updaters import StandardUpdater
from chainer.training import Trainer
N_LAYERS = 2
DEVICE = 1
IN_N_UNITS = 1
OUT_N_UNITS = 1
class MyModel(chainer.Chain):
def __init__(self):
super().__init__()
with self.init_scope():
self.n_step_lstm = L.NStepLSTM(N_LAYERS, IN_N_UNITS, OUT_N_UNITS, 0.8)
def __call__(self, xs, ys):
_, _, ys_predicted = self.n_step_lstm(None, None, xs)
loss = 0
for i in range(len(ys)):
loss += F.mean_squared_error(ys_predicted[i], ys[i])
return loss
class MyDataset(chainer.dataset.DatasetMixin):
N_SAMPLES = 100
NOISE_LEVEL = 0.1
DATA_LEN_MAX = 10
DATA_LEN_MIN = 5
def __init__(self):
self.data = np.sin(np.arange(self.N_SAMPLES, dtype=np.float32) * 0.1) \
+ self.NOISE_LEVEL * np.random.randn(100).astype(np.float32)
def __len__(self):
return self.N_SAMPLES - (self.DATA_LEN_MAX + 1)
def get_example(self, i):
data_len = i % (self.DATA_LEN_MAX - self.DATA_LEN_MIN) + self.DATA_LEN_MIN
return np.expand_dims(self.data[i:i + data_len], 1), np.expand_dims(self.data[i + 1:i + data_len + 1], 1)
def convert(batch, device):
def to_device_batch(batch):
if device is None:
return batch
elif device < 0:
return [chainer.dataset.to_device(device, x) for x in batch]
else:
xp = chainer.cuda.cupy.get_array_module(*batch)
concat = xp.concatenate(batch, axis=0)
sections = np.cumsum([len(x)
for x in batch[:-1]], dtype=np.int32)
concat_dev = chainer.dataset.to_device(device, concat)
batch_dev = chainer.cuda.cupy.split(concat_dev, sections)
return batch_dev
return {'xs': to_device_batch([x for x, _ in batch]),
'ys': to_device_batch([y for _, y in batch])}
dataset = MyDataset()
iterator = SerialIterator(dataset, batch_size=16)
model = MyModel()
optimizer = Adam()
optimizer.setup(model)
updater = StandardUpdater(iterator, optimizer, converter=convert, device=DEVICE)
trainer = Trainer(updater, (10, "iteration"), "result")
trainer.run()
Exception in main training loop: cudaErrorIllegalAddress: an illegal memory access was encountered
Traceback (most recent call last):
File "/home/hashimoto/python372/lib/python3.7/site-packages/chainer/training/trainer.py", line 315, in run
update()
File "/home/hashimoto/python372/lib/python3.7/site-packages/chainer/training/updaters/standard_updater.py", line 165, in update
self.update_core()
File "/home/hashimoto/python372/lib/python3.7/site-packages/chainer/training/updaters/standard_updater.py", line 179, in update_core
optimizer.update(loss_func, **in_arrays)
File "/home/hashimoto/python372/lib/python3.7/site-packages/chainer/optimizer.py", line 680, in update
loss = lossfun(*args, **kwds)
File "/tmp/pycharm_project_979/nsteplstm_test.py", line 24, in __call__
_, _, ys_predicted = self.n_step_lstm(None, None, xs)
File "/home/hashimoto/python372/lib/python3.7/site-packages/chainer/link.py", line 242, in __call__
out = forward(*args, **kwargs)
File "/home/hashimoto/python372/lib/python3.7/site-packages/chainer/links/connection/n_step_lstm.py", line 70, in forward
(hy, cy), ys = self._call([hx, cx], xs, **kwargs)
File "/home/hashimoto/python372/lib/python3.7/site-packages/chainer/links/connection/n_step_rnn.py", line 207, in _call
for h in result[:-1]]
File "/home/hashimoto/python372/lib/python3.7/site-packages/chainer/links/connection/n_step_rnn.py", line 207, in <listcomp>
for h in result[:-1]]
File "/home/hashimoto/python372/lib/python3.7/site-packages/chainer/functions/array/permutate.py", line 133, in permutate
y, = Permutate(indices, axis, inv).apply((x,))
File "/home/hashimoto/python372/lib/python3.7/site-packages/chainer/function_node.py", line 263, in apply
outputs = self.forward(in_data)
File "/home/hashimoto/python372/lib/python3.7/site-packages/chainer/functions/array/permutate.py", line 70, in forward
return self._permutate(x, inds, self.inv),
File "/home/hashimoto/python372/lib/python3.7/site-packages/chainer/functions/array/permutate.py", line 61, in _permutate
return x[((slice(None),) * self.axis) + (indices,)]
File "cupy/core/core.pyx", line 1625, in cupy.core.core.ndarray.__getitem__
File "cupy/core/core.pyx", line 3134, in cupy.core.core._prepare_slice_list
File "cupy/core/core.pyx", line 2397, in cupy.core.core.array
File "cupy/core/core.pyx", line 2394, in cupy.core.core.array
File "cupy/cuda/pinned_memory.pyx", line 212, in cupy.cuda.pinned_memory.alloc_pinned_memory
File "cupy/cuda/pinned_memory.pyx", line 286, in cupy.cuda.pinned_memory.PinnedMemoryPool.malloc
File "cupy/cuda/pinned_memory.pyx", line 306, in cupy.cuda.pinned_memory.PinnedMemoryPool.malloc
File "cupy/cuda/pinned_memory.pyx", line 303, in cupy.cuda.pinned_memory.PinnedMemoryPool.malloc
File "cupy/cuda/pinned_memory.pyx", line 177, in cupy.cuda.pinned_memory._malloc
File "cupy/cuda/pinned_memory.pyx", line 178, in cupy.cuda.pinned_memory._malloc
File "cupy/cuda/pinned_memory.pyx", line 29, in cupy.cuda.pinned_memory.PinnedMemory.__init__
File "cupy/cuda/runtime.pyx", line 231, in cupy.cuda.runtime.hostAlloc
File "cupy/cuda/runtime.pyx", line 137, in cupy.cuda.runtime.check_status
Will finalize trainer extensions and updater before reraising the exception.
Traceback (most recent call last):
File "/tmp/pycharm_project_979/nsteplstm_test.py", line 77, in <module>
trainer.run()
File "/home/hashimoto/python372/lib/python3.7/site-packages/chainer/training/trainer.py", line 329, in run
six.reraise(*sys.exc_info())
File "/home/hashimoto/python372/lib/python3.7/site-packages/six.py", line 693, in reraise
raise value
File "/home/hashimoto/python372/lib/python3.7/site-packages/chainer/training/trainer.py", line 315, in run
update()
File "/home/hashimoto/python372/lib/python3.7/site-packages/chainer/training/updaters/standard_updater.py", line 165, in update
self.update_core()
File "/home/hashimoto/python372/lib/python3.7/site-packages/chainer/training/updaters/standard_updater.py", line 179, in update_core
optimizer.update(loss_func, **in_arrays)
File "/home/hashimoto/python372/lib/python3.7/site-packages/chainer/optimizer.py", line 680, in update
loss = lossfun(*args, **kwds)
File "/tmp/pycharm_project_979/nsteplstm_test.py", line 24, in __call__
_, _, ys_predicted = self.n_step_lstm(None, None, xs)
File "/home/hashimoto/python372/lib/python3.7/site-packages/chainer/link.py", line 242, in __call__
out = forward(*args, **kwargs)
File "/home/hashimoto/python372/lib/python3.7/site-packages/chainer/links/connection/n_step_lstm.py", line 70, in forward
(hy, cy), ys = self._call([hx, cx], xs, **kwargs)
File "/home/hashimoto/python372/lib/python3.7/site-packages/chainer/links/connection/n_step_rnn.py", line 207, in _call
for h in result[:-1]]
File "/home/hashimoto/python372/lib/python3.7/site-packages/chainer/links/connection/n_step_rnn.py", line 207, in <listcomp>
for h in result[:-1]]
File "/home/hashimoto/python372/lib/python3.7/site-packages/chainer/functions/array/permutate.py", line 133, in permutate
y, = Permutate(indices, axis, inv).apply((x,))
File "/home/hashimoto/python372/lib/python3.7/site-packages/chainer/function_node.py", line 263, in apply
outputs = self.forward(in_data)
File "/home/hashimoto/python372/lib/python3.7/site-packages/chainer/functions/array/permutate.py", line 70, in forward
return self._permutate(x, inds, self.inv),
File "/home/hashimoto/python372/lib/python3.7/site-packages/chainer/functions/array/permutate.py", line 61, in _permutate
return x[((slice(None),) * self.axis) + (indices,)]
File "cupy/core/core.pyx", line 1625, in cupy.core.core.ndarray.__getitem__
File "cupy/core/core.pyx", line 3134, in cupy.core.core._prepare_slice_list
File "cupy/core/core.pyx", line 2397, in cupy.core.core.array
File "cupy/core/core.pyx", line 2394, in cupy.core.core.array
File "cupy/cuda/pinned_memory.pyx", line 212, in cupy.cuda.pinned_memory.alloc_pinned_memory
File "cupy/cuda/pinned_memory.pyx", line 286, in cupy.cuda.pinned_memory.PinnedMemoryPool.malloc
File "cupy/cuda/pinned_memory.pyx", line 306, in cupy.cuda.pinned_memory.PinnedMemoryPool.malloc
File "cupy/cuda/pinned_memory.pyx", line 303, in cupy.cuda.pinned_memory.PinnedMemoryPool.malloc
File "cupy/cuda/pinned_memory.pyx", line 177, in cupy.cuda.pinned_memory._malloc
File "cupy/cuda/pinned_memory.pyx", line 178, in cupy.cuda.pinned_memory._malloc
File "cupy/cuda/pinned_memory.pyx", line 29, in cupy.cuda.pinned_memory.PinnedMemory.__init__
File "cupy/cuda/runtime.pyx", line 231, in cupy.cuda.runtime.hostAlloc
File "cupy/cuda/runtime.pyx", line 137, in cupy.cuda.runtime.check_status
cupy.cuda.runtime.CUDARuntimeError: cudaErrorIllegalAddress: an illegal memory access was encountered
Traceback (most recent call last):
File "cupy/cuda/driver.pyx", line 192, in cupy.cuda.driver.moduleUnload
File "cupy/cuda/driver.pyx", line 81, in cupy.cuda.driver.check_status
cupy.cuda.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
Exception ignored in: 'cupy.cuda.function.Module.__dealloc__'
Traceback (most recent call last):
File "cupy/cuda/driver.pyx", line 192, in cupy.cuda.driver.moduleUnload
File "cupy/cuda/driver.pyx", line 81, in cupy.cuda.driver.check_status
cupy.cuda.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
Process finished with exit code 1
Note
This issue is originally reported in Chainer Slack (JP). The original error message was cupy.cuda.cudnn.CuDNNError: CUDNN_STATUS_INTERNAL_ERROR
, but in my environment it was CUDA_ERROR_ILLEGAL_ADDRESS
.
Issue Analytics
- State:
- Created 5 years ago
- Comments:6 (6 by maintainers)
Top Results From Across the Web
No results found
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I appreciate your clarification.
It seems that there’s a need to test dropout enabled
n_step_(rnn lstm | gru)
functions and links 🤔@crcrpar More specifically, the error of this issue is caused by cudnn’s training-dropout (testing-dropout differs from it), but all test cases uses
dropout = 0.0
and inconfig.train = False
in https://github.com/chainer/chainer/blob/f73b1bfcbeafe8c3a157d800440d7fe54e942598/tests/chainer_tests/links_tests/connection_tests/test_n_step_lstm.py#L128-L138I confirmed that if and only if
dropout != 0
andchainer.using_config('train', True)
the test on multi-gpu failed.