Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

RuntimeError: CUDA error: device-side assert triggered

See original GitHub issue

I got this error when using simple_lm_finetuning.py to continue to train a bert model. Could anyone can help? Thanks a lot. Here is the cuda and python trace. I confirm that my input max_length don’t over max_position_embeddings

/pytorch/aten/src/THC/THCTensorIndex.cu:362: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [329,0,0], thread: [94,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/THC/THCTensorIndex.cu:362: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [329,0,0], thread: [95,0,0] Assertion `srcIndex < srcSelectDimSize` failed.

Loading Train Dataset input_lm.txt
Traceback (most recent call last):
  File "simple_lm_finetuning.py", line 646, in <module>
    main()
  File "simple_lm_finetuning.py", line 592, in main
    loss = model(input_ids, segment_ids, input_mask, lm_label_ids, is_next)
  File "/home/jianfeng.ps/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/jianfeng.ps/bert-mrc/pytorch_pretrained_bert/modeling.py", line 783, in forward
  File "/home/jianfeng.ps/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/jianfeng.ps/bert-mrc/pytorch_pretrained_bert/modeling.py", line 714, in forward
    extended_attention_mask = extended_attention_mask.to(dtype=next(self.parameters()).dtype) # fp16 compatibility
  File "/home/jianfeng.ps/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/jianfeng.ps/bert-mrc/pytorch_pretrained_bert/modeling.py", line 261, in forward
    position_embeddings = self.position_embeddings(position_ids)
  File "/home/jianfeng.ps/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/jianfeng.ps/anaconda3/lib/python3.7/site-packages/torch/nn/modules/sparse.py", line 118, in forward
    self.norm_type, self.scale_grad_by_freq, self.sparse)
  File "/home/jianfeng.ps/anaconda3/lib/python3.7/site-packages/torch/nn/functional.py", line 1454, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: CUDA error: device-side assert triggered

Issue Analytics

State:
Created 4 years ago
Reactions:1
Comments:9

Top GitHub Comments

2reactions

stephenrollercommented, Jun 5, 2019

Then it’s definitely that you’ve got a bad index into the positional embeddings.

1reaction

stephenrollercommented, Jun 4, 2019

Rerun with environmental variable CUDA_LAUNCH_BLOCKING=1 and see what line it crashed on.

This is almost always an out-of-bounds error on some embeddings lookup. Usually positional embeddings, but it could be word embeddings or segment embeddings.

Top Results From Across the Web

CUDA Error: Device-Side Assert Triggered: Solved | Built In

A CUDA error: device-side assert triggered is an error that's often caused when you either have inconsistency between the number of labels and ......

CUDA runtime error (59) : device-side assert triggered

One way to raise the "CUDA error: device-side assert triggered" RuntimeError , is by indexing into a GPU torch.Tensor using a list having ......

RuntimeError: CUDA error: device-side assert triggered

Hi, First thing is to try to run the code on CPU. CPU code has more checks so it will possibly return a...

RuntimeError: CUDA error: device-side assert triggered · Issue ...

When I try running tutorial 2 on Colab I run into this error message: RuntimeError: CUDA error: device-side assert triggered CUDA kernel errors...

[HELP] RuntimeError: CUDA error: device-side assert triggered

I get this error: RuntimeError: CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call,so the ......