RuntimeError: cuda runtime error (59) : device-side assert triggered
See original GitHub issueHI, I need some help! I meet this problem like:
Loading base network...
Initializing weights...
Loading the dataset...
Training RefineDet on: VOC0712
Using the specified args:
Namespace(basenet='./weights/vgg16_reducedfc.pth', batch_size=32, cuda=True, cuda_device=0, dataset='VOC', dataset_root='/home/dawn/data/VOCdevkit', gamma=0.1, input_size='320', lr=0.001, momentum=0.9, num_workers=8, resume=None, save_folder='UAV/checkpoints.try/', start_iter=0, visdom=False, weight_decay=0.0005)
tensor([[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
...,
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0]], dtype=torch.uint8)
tensor([[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
...,
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0]], dtype=torch.uint8)
timer: 11.4967 sec.
iter 0 || ARM_L Loss: 9.1703 ARM_C Loss: 8.0969 ODM_L Loss: 8.6313 ODM_C Loss: 8.5600 || tensor([[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
...,
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0]], dtype=torch.uint8)
tensor([[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
...,
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0]], dtype=torch.uint8)
tensor([[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
...,
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0]], dtype=torch.uint8)
/pytorch/aten/src/THC/THCTensorScatterGather.cu:97: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [30,0,0], thread: [252,0,0] Assertion `indexValue >= 0 && indexValue < src.sizes[dim]` failed.
THCudaCheck FAIL file=/pytorch/aten/src/THC/generic/THCTensorCopy.cpp line=70 error=59 : device-side assert triggered
Traceback (most recent call last):
File "train_refinedet.py", line 292, in <module>
train()
File "train_refinedet.py", line 205, in train
odm_loss_l, odm_loss_c = odm_criterion(out, targets)
File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 477, in __call__
result = self.forward(*input, **kwargs)
File "/home/dawn/yangwz/RefineDet.Pytorch/layers/modules/refinedet_multibox_loss.py", line 125, in forward
print(pos)
File "/usr/local/lib/python3.5/dist-packages/torch/tensor.py", line 57, in __repr__
return torch._tensor_str._str(self)
File "/usr/local/lib/python3.5/dist-packages/torch/_tensor_str.py", line 256, in _str
formatter = _Formatter(get_summarized_data(self) if summarize else self)
File "/usr/local/lib/python3.5/dist-packages/torch/_tensor_str.py", line 77, in __init__
for value in copy.tolist():
RuntimeError: cuda runtime error (59) : device-side assert triggered at /pytorch/aten/src/THC/generic/THCTensorCopy.cpp:70
It is weird that I run the same code on Windows 10 with Python3.6 and Pytorch 0.4.1, it’s ok. But when I run this code on Ubuntu16.04 with Python3.5 and Pytorch0.4.0, it will be wrong.
The error raised from layers/modules/refinedet_multibox_loss.py Line 125 which is to “filter out pos boxes for now”, like
loss_c[pos.view(-1,1)] = 0
So I print the “pos” tensor, and the error appeared.
Issue Analytics
- State:
- Created 5 years ago
- Comments:6 (1 by maintainers)
Top Results From Across the Web
CUDA runtime error (59) : device-side assert triggered
This is an error with your target labels: t >= 0 && t < n_classes . print your labels and make sure that...
Read more >CUDA Error: Device-Side Assert Triggered: Solved | Built In
The reason this happens is that even though you may have fixed the bug in your code, once the runtime error 59 is...
Read more >RuntimeError: CUDA error: device-side assert triggered
Hi, First thing is to try to run the code on CPU. CPU code has more checks so it will possibly return a...
Read more >cuda runtime error (59) : device-side assert triggered at ...
I'm running into the same cuda runtime error (59) when running code from PracticalPytorch Seq2Seq example... when the "MAX_LENGTH" variable is ...
Read more >[HELP] RuntimeError: CUDA error: device-side assert triggered
I get this error: RuntimeError: CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call,so the ......
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@luuuyi Ah I know where i’m wrong. The NUM_CLASSES is 21 in VOC includes background and the ‘VOC_CLASSES’ list defined in data/voc0712.py only include the target objects without ‘background’. I am training my own dataset so I should set the num_classes be categories+1.
Thanks all the same.
hello,have you solve the problem? Thank you!