Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

RuntimeError: cuda runtime error (59) : device-side assert triggered

See original GitHub issue

HI, I need some help! I meet this problem like:

Loading base network...
Initializing weights...
Loading the dataset...
Training RefineDet on: VOC0712
Using the specified args:
Namespace(basenet='./weights/vgg16_reducedfc.pth', batch_size=32, cuda=True, cuda_device=0, dataset='VOC', dataset_root='/home/dawn/data/VOCdevkit', gamma=0.1, input_size='320', lr=0.001, momentum=0.9, num_workers=8, resume=None, save_folder='UAV/checkpoints.try/', start_iter=0, visdom=False, weight_decay=0.0005)
tensor([[0, 0, 0,  ..., 0, 0, 0],
        [0, 0, 0,  ..., 0, 0, 0],
        [0, 0, 0,  ..., 0, 0, 0],
        ...,
        [0, 0, 0,  ..., 0, 0, 0],
        [0, 0, 0,  ..., 0, 0, 0],
        [0, 0, 0,  ..., 0, 0, 0]], dtype=torch.uint8)
tensor([[0, 0, 0,  ..., 0, 0, 0],
        [0, 0, 0,  ..., 0, 0, 0],
        [0, 0, 0,  ..., 0, 0, 0],
        ...,
        [0, 0, 0,  ..., 0, 0, 0],
        [0, 0, 0,  ..., 0, 0, 0],
        [0, 0, 0,  ..., 0, 0, 0]], dtype=torch.uint8)
timer: 11.4967 sec.
iter 0 || ARM_L Loss: 9.1703 ARM_C Loss: 8.0969 ODM_L Loss: 8.6313 ODM_C Loss: 8.5600 || tensor([[0, 0, 0,  ..., 0, 0, 0],
        [0, 0, 0,  ..., 0, 0, 0],
        [0, 0, 0,  ..., 0, 0, 0],
        ...,
        [0, 0, 0,  ..., 0, 0, 0],
        [0, 0, 0,  ..., 0, 0, 0],
        [0, 0, 0,  ..., 0, 0, 0]], dtype=torch.uint8)
tensor([[0, 0, 0,  ..., 0, 0, 0],
        [0, 0, 0,  ..., 0, 0, 0],
        [0, 0, 0,  ..., 0, 0, 0],
        ...,
        [0, 0, 0,  ..., 0, 0, 0],
        [0, 0, 0,  ..., 0, 0, 0],
        [0, 0, 0,  ..., 0, 0, 0]], dtype=torch.uint8)
tensor([[0, 0, 0,  ..., 0, 0, 0],
        [0, 0, 0,  ..., 0, 0, 0],
        [0, 0, 0,  ..., 0, 0, 0],
        ...,
        [0, 0, 0,  ..., 0, 0, 0],
        [0, 0, 0,  ..., 0, 0, 0],
        [0, 0, 0,  ..., 0, 0, 0]], dtype=torch.uint8)
/pytorch/aten/src/THC/THCTensorScatterGather.cu:97: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [30,0,0], thread: [252,0,0] Assertion `indexValue >= 0 && indexValue < src.sizes[dim]` failed.
THCudaCheck FAIL file=/pytorch/aten/src/THC/generic/THCTensorCopy.cpp line=70 error=59 : device-side assert triggered
Traceback (most recent call last):
  File "train_refinedet.py", line 292, in <module>
    train()
  File "train_refinedet.py", line 205, in train
    odm_loss_l, odm_loss_c = odm_criterion(out, targets)
  File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/dawn/yangwz/RefineDet.Pytorch/layers/modules/refinedet_multibox_loss.py", line 125, in forward
    print(pos)
  File "/usr/local/lib/python3.5/dist-packages/torch/tensor.py", line 57, in __repr__
    return torch._tensor_str._str(self)
  File "/usr/local/lib/python3.5/dist-packages/torch/_tensor_str.py", line 256, in _str
    formatter = _Formatter(get_summarized_data(self) if summarize else self)
  File "/usr/local/lib/python3.5/dist-packages/torch/_tensor_str.py", line 77, in __init__
    for value in copy.tolist():
RuntimeError: cuda runtime error (59) : device-side assert triggered at /pytorch/aten/src/THC/generic/THCTensorCopy.cpp:70

It is weird that I run the same code on Windows 10 with Python3.6 and Pytorch 0.4.1, it’s ok. But when I run this code on Ubuntu16.04 with Python3.5 and Pytorch0.4.0, it will be wrong.

The error raised from layers/modules/refinedet_multibox_loss.py Line 125 which is to “filter out pos boxes for now”, like loss_c[pos.view(-1,1)] = 0 So I print the “pos” tensor, and the error appeared.

Issue Analytics

State:
Created 5 years ago
Comments:6 (1 by maintainers)

Top GitHub Comments

1reaction

wnzhyeecommented, Jan 4, 2019

@luuuyi Ah I know where i’m wrong. The NUM_CLASSES is 21 in VOC includes background and the ‘VOC_CLASSES’ list defined in data/voc0712.py only include the target objects without ‘background’. I am training my own dataset so I should set the num_classes be categories+1.

Thanks all the same.

0reactions

zeroorherocommented, Feb 6, 2020

sorry,I meet the same problem,I want to train one class,and I change the num_calss about voc to ‘num_classes’: 2 in the config.py, and I set VOC_CLASSES = [(‘aircraft’)] in voc0712.py,at the same I change the num_classes in refinedet.py but why it do not work? please give me a help?thank you @wnzhyee @luuuyi

hello,have you solve the problem? Thank you!

Top Results From Across the Web

CUDA runtime error (59) : device-side assert triggered

This is an error with your target labels: t >= 0 && t < n_classes . print your labels and make sure that...

CUDA Error: Device-Side Assert Triggered: Solved | Built In

The reason this happens is that even though you may have fixed the bug in your code, once the runtime error 59 is...

RuntimeError: CUDA error: device-side assert triggered

Hi, First thing is to try to run the code on CPU. CPU code has more checks so it will possibly return a...

cuda runtime error (59) : device-side assert triggered at ...

I'm running into the same cuda runtime error (59) when running code from PracticalPytorch Seq2Seq example... when the "MAX_LENGTH" variable is ...

[HELP] RuntimeError: CUDA error: device-side assert triggered

I get this error: RuntimeError: CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call,so the ......