Error encountered while running GTAD training
See original GitHub issueHi, I encountered the following error:
./data/thumos_annotations/saved.2048.train.nf256.sf5.num200.train.pkl Got saved data. Size of data: 1389 ./data/thumos_annotations/saved.2048.validation.nf256.sf5.num213.train.pkl Got saved data. Size of data: 1626 [Epoch 000] Loss 6.13 = 4.85 + 1.28 (train)
The error has a long list of cuda assertion error of this format:
/opt/conda/conda-bld/pytorch_1556653114079/work/aten/src/ATen/native/cuda/IndexKernel.cu:53: lambda ->auto::operator()(int)->auto: block: [820,0,0], thread: [120,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds"
failed.
.
.
.
.
.
.
/opt/conda/conda-bld/pytorch_1556653114079/work/aten/src/ATen/native/cuda/IndexKernel.cu:53: lambda ->auto::operator()(int)->auto: block: [820,0,0], thread: [126,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1556653114079/work/aten/src/ATen/native/cuda/IndexKernel.cu:53: lambda ->auto::operator()(int)->auto: block: [820,0,0], thread: [127,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds"
failed.
Traceback (most recent call last):
File “gtad_train.py”, line 129, in <module>
test(test_loader, model, epoch, mask)
File “gtad_train.py”, line 74, in test
gt_iou_map = label_confidence.cuda() * bm_mask
RuntimeError: CUDA error: device-side assert triggered
Please let me know what you think could be the problem.
Thanks, Srikanth
Issue Analytics
- State:
- Created 3 years ago
- Comments:7 (2 by maintainers)
Top GitHub Comments
Hi, @srikanth-sfu ! May I ask the batch size you use? If you use too small a batch size, it might happen that no positive samples exist in the batch. As a consequence, the memory error like this would occur.
@srikanth-sfu Thanks for your feedback.
To reproduce the errors you mentioned, could you specify the environment? What are the GPU type and CUDA version?