Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Error encountered while running GTAD training

See original GitHub issue

Hi, I encountered the following error:

./data/thumos_annotations/saved.2048.train.nf256.sf5.num200.train.pkl Got saved data. Size of data: 1389 ./data/thumos_annotations/saved.2048.validation.nf256.sf5.num213.train.pkl Got saved data. Size of data: 1626 [Epoch 000] Loss 6.13 = 4.85 + 1.28 (train)

The error has a long list of cuda assertion error of this format: /opt/conda/conda-bld/pytorch_1556653114079/work/aten/src/ATen/native/cuda/IndexKernel.cu:53: lambda ->auto::operator()(int)->auto: block: [820,0,0], thread: [120,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. . . . . . . /opt/conda/conda-bld/pytorch_1556653114079/work/aten/src/ATen/native/cuda/IndexKernel.cu:53: lambda ->auto::operator()(int)->auto: block: [820,0,0], thread: [126,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1556653114079/work/aten/src/ATen/native/cuda/IndexKernel.cu:53: lambda ->auto::operator()(int)->auto: block: [820,0,0], thread: [127,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. Traceback (most recent call last): File “gtad_train.py”, line 129, in <module> test(test_loader, model, epoch, mask) File “gtad_train.py”, line 74, in test gt_iou_map = label_confidence.cuda() * bm_mask RuntimeError: CUDA error: device-side assert triggered

Please let me know what you think could be the problem.

Thanks, Srikanth

Issue Analytics

State:
Created 3 years ago
Comments:7 (2 by maintainers)

Top GitHub Comments

2reactions

coolbaycommented, Jun 22, 2020

Hi, @srikanth-sfu ! May I ask the batch size you use? If you use too small a batch size, it might happen that no positive samples exist in the batch. As a consequence, the memory error like this would occur.

1reaction

frostinassikycommented, Jun 23, 2020

@srikanth-sfu Thanks for your feedback.

To reproduce the errors you mentioned, could you specify the environment? What are the GPU type and CUDA version?

Top Results From Across the Web

COMMON ERRORS IN KRONOS AND HOW TO FIX THEM

TIP: Do a CTRL + Click on the topic to link to the error details ... Employees should leave the pay code box...

Script Hook V Critical Error in GTA 5 [100% FIXED] - YouTube

ScriptHookV Critical is a common error encountered by the gamers while launching Grand theft auto V even seen in the offline mode and...

Cardiac Arrest Symptoms - Runners Who Survived Sudden ...

Athletes—often young and seemingly healthy—die suddenly at races, during training runs, or in the off hours between them.

The 6 Most Common Running Injuries (Plus How to Treat Them)

Training errors (increasing mileage or intensity too quickly). Poor running form. Increasing weekly mileage too quickly. Transitioning too ...

Troubleshoot transfer configurations | BigQuery - Google Cloud

The following are common errors encountered when creating a Google Merchant Center transfer. Error: No data to transfer found for the Merchant account....