question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

RuntimeError: cuda runtime error (59) : device-side assert triggered

See original GitHub issue

HI, I need some help! I meet this problem like:

Loading base network...
Initializing weights...
Loading the dataset...
Training RefineDet on: VOC0712
Using the specified args:
Namespace(basenet='./weights/vgg16_reducedfc.pth', batch_size=32, cuda=True, cuda_device=0, dataset='VOC', dataset_root='/home/dawn/data/VOCdevkit', gamma=0.1, input_size='320', lr=0.001, momentum=0.9, num_workers=8, resume=None, save_folder='UAV/checkpoints.try/', start_iter=0, visdom=False, weight_decay=0.0005)
tensor([[0, 0, 0,  ..., 0, 0, 0],
        [0, 0, 0,  ..., 0, 0, 0],
        [0, 0, 0,  ..., 0, 0, 0],
        ...,
        [0, 0, 0,  ..., 0, 0, 0],
        [0, 0, 0,  ..., 0, 0, 0],
        [0, 0, 0,  ..., 0, 0, 0]], dtype=torch.uint8)
tensor([[0, 0, 0,  ..., 0, 0, 0],
        [0, 0, 0,  ..., 0, 0, 0],
        [0, 0, 0,  ..., 0, 0, 0],
        ...,
        [0, 0, 0,  ..., 0, 0, 0],
        [0, 0, 0,  ..., 0, 0, 0],
        [0, 0, 0,  ..., 0, 0, 0]], dtype=torch.uint8)
timer: 11.4967 sec.
iter 0 || ARM_L Loss: 9.1703 ARM_C Loss: 8.0969 ODM_L Loss: 8.6313 ODM_C Loss: 8.5600 || tensor([[0, 0, 0,  ..., 0, 0, 0],
        [0, 0, 0,  ..., 0, 0, 0],
        [0, 0, 0,  ..., 0, 0, 0],
        ...,
        [0, 0, 0,  ..., 0, 0, 0],
        [0, 0, 0,  ..., 0, 0, 0],
        [0, 0, 0,  ..., 0, 0, 0]], dtype=torch.uint8)
tensor([[0, 0, 0,  ..., 0, 0, 0],
        [0, 0, 0,  ..., 0, 0, 0],
        [0, 0, 0,  ..., 0, 0, 0],
        ...,
        [0, 0, 0,  ..., 0, 0, 0],
        [0, 0, 0,  ..., 0, 0, 0],
        [0, 0, 0,  ..., 0, 0, 0]], dtype=torch.uint8)
tensor([[0, 0, 0,  ..., 0, 0, 0],
        [0, 0, 0,  ..., 0, 0, 0],
        [0, 0, 0,  ..., 0, 0, 0],
        ...,
        [0, 0, 0,  ..., 0, 0, 0],
        [0, 0, 0,  ..., 0, 0, 0],
        [0, 0, 0,  ..., 0, 0, 0]], dtype=torch.uint8)
/pytorch/aten/src/THC/THCTensorScatterGather.cu:97: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [30,0,0], thread: [252,0,0] Assertion `indexValue >= 0 && indexValue < src.sizes[dim]` failed.
THCudaCheck FAIL file=/pytorch/aten/src/THC/generic/THCTensorCopy.cpp line=70 error=59 : device-side assert triggered
Traceback (most recent call last):
  File "train_refinedet.py", line 292, in <module>
    train()
  File "train_refinedet.py", line 205, in train
    odm_loss_l, odm_loss_c = odm_criterion(out, targets)
  File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/dawn/yangwz/RefineDet.Pytorch/layers/modules/refinedet_multibox_loss.py", line 125, in forward
    print(pos)
  File "/usr/local/lib/python3.5/dist-packages/torch/tensor.py", line 57, in __repr__
    return torch._tensor_str._str(self)
  File "/usr/local/lib/python3.5/dist-packages/torch/_tensor_str.py", line 256, in _str
    formatter = _Formatter(get_summarized_data(self) if summarize else self)
  File "/usr/local/lib/python3.5/dist-packages/torch/_tensor_str.py", line 77, in __init__
    for value in copy.tolist():
RuntimeError: cuda runtime error (59) : device-side assert triggered at /pytorch/aten/src/THC/generic/THCTensorCopy.cpp:70

It is weird that I run the same code on Windows 10 with Python3.6 and Pytorch 0.4.1, it’s ok. But when I run this code on Ubuntu16.04 with Python3.5 and Pytorch0.4.0, it will be wrong.

The error raised from layers/modules/refinedet_multibox_loss.py Line 125 which is to “filter out pos boxes for now”, like loss_c[pos.view(-1,1)] = 0 So I print the “pos” tensor, and the error appeared.

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:6 (1 by maintainers)

github_iconTop GitHub Comments

1reaction
wnzhyeecommented, Jan 4, 2019

@luuuyi Ah I know where i’m wrong. The NUM_CLASSES is 21 in VOC includes background and the ‘VOC_CLASSES’ list defined in data/voc0712.py only include the target objects without ‘background’. I am training my own dataset so I should set the num_classes be categories+1.

Thanks all the same.

0reactions
zeroorherocommented, Feb 6, 2020

sorry,I meet the same problem,I want to train one class,and I change the num_calss about voc to ‘num_classes’: 2 in the config.py, and I set VOC_CLASSES = [(‘aircraft’)] in voc0712.py,at the same I change the num_classes in refinedet.py but why it do not work? cp please give me a help?thank you @wnzhyee @luuuyi

hello,have you solve the problem? Thank you!

Read more comments on GitHub >

github_iconTop Results From Across the Web

CUDA runtime error (59) : device-side assert triggered
This is an error with your target labels: t >= 0 && t < n_classes . print your labels and make sure that...
Read more >
CUDA Error: Device-Side Assert Triggered: Solved | Built In
The reason this happens is that even though you may have fixed the bug in your code, once the runtime error 59 is...
Read more >
RuntimeError: CUDA error: device-side assert triggered
Hi, First thing is to try to run the code on CPU. CPU code has more checks so it will possibly return a...
Read more >
cuda runtime error (59) : device-side assert triggered at ...
I'm running into the same cuda runtime error (59) when running code from PracticalPytorch Seq2Seq example... when the "MAX_LENGTH" variable is ...
Read more >
[HELP] RuntimeError: CUDA error: device-side assert triggered
I get this error: RuntimeError: CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call,so the ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found