RuntimeError: CUDA error: device-side assert triggered
See original GitHub issueI got this error when using simple_lm_finetuning.py to continue to train a bert model. Could anyone can help? Thanks a lot. Here is the cuda and python trace. I confirm that my input max_length don’t over max_position_embeddings
/pytorch/aten/src/THC/THCTensorIndex.cu:362: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [329,0,0], thread: [94,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/THC/THCTensorIndex.cu:362: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [329,0,0], thread: [95,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
Loading Train Dataset input_lm.txt
Traceback (most recent call last):
File "simple_lm_finetuning.py", line 646, in <module>
main()
File "simple_lm_finetuning.py", line 592, in main
loss = model(input_ids, segment_ids, input_mask, lm_label_ids, is_next)
File "/home/jianfeng.ps/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
result = self.forward(*input, **kwargs)
File "/home/jianfeng.ps/bert-mrc/pytorch_pretrained_bert/modeling.py", line 783, in forward
File "/home/jianfeng.ps/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
result = self.forward(*input, **kwargs)
File "/home/jianfeng.ps/bert-mrc/pytorch_pretrained_bert/modeling.py", line 714, in forward
extended_attention_mask = extended_attention_mask.to(dtype=next(self.parameters()).dtype) # fp16 compatibility
File "/home/jianfeng.ps/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
result = self.forward(*input, **kwargs)
File "/home/jianfeng.ps/bert-mrc/pytorch_pretrained_bert/modeling.py", line 261, in forward
position_embeddings = self.position_embeddings(position_ids)
File "/home/jianfeng.ps/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
result = self.forward(*input, **kwargs)
File "/home/jianfeng.ps/anaconda3/lib/python3.7/site-packages/torch/nn/modules/sparse.py", line 118, in forward
self.norm_type, self.scale_grad_by_freq, self.sparse)
File "/home/jianfeng.ps/anaconda3/lib/python3.7/site-packages/torch/nn/functional.py", line 1454, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: CUDA error: device-side assert triggered
Issue Analytics
- State:
- Created 4 years ago
- Reactions:1
- Comments:9
Top Results From Across the Web
CUDA Error: Device-Side Assert Triggered: Solved | Built In
A CUDA error: device-side assert triggered is an error that's often caused when you either have inconsistency between the number of labels and ......
Read more >CUDA runtime error (59) : device-side assert triggered
One way to raise the "CUDA error: device-side assert triggered" RuntimeError , is by indexing into a GPU torch.Tensor using a list having ......
Read more >RuntimeError: CUDA error: device-side assert triggered
Hi, First thing is to try to run the code on CPU. CPU code has more checks so it will possibly return a...
Read more >RuntimeError: CUDA error: device-side assert triggered · Issue ...
When I try running tutorial 2 on Colab I run into this error message: RuntimeError: CUDA error: device-side assert triggered CUDA kernel errors...
Read more >[HELP] RuntimeError: CUDA error: device-side assert triggered
I get this error: RuntimeError: CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call,so the ......
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Then it’s definitely that you’ve got a bad index into the positional embeddings.
Rerun with environmental variable
CUDA_LAUNCH_BLOCKING=1
and see what line it crashed on.This is almost always an out-of-bounds error on some embeddings lookup. Usually positional embeddings, but it could be word embeddings or segment embeddings.