question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Error encountered while running GTAD training

See original GitHub issue

Hi, I encountered the following error:

./data/thumos_annotations/saved.2048.train.nf256.sf5.num200.train.pkl Got saved data. Size of data: 1389 ./data/thumos_annotations/saved.2048.validation.nf256.sf5.num213.train.pkl Got saved data. Size of data: 1626 [Epoch 000] Loss 6.13 = 4.85 + 1.28 (train)

The error has a long list of cuda assertion error of this format: /opt/conda/conda-bld/pytorch_1556653114079/work/aten/src/ATen/native/cuda/IndexKernel.cu:53: lambda ->auto::operator()(int)->auto: block: [820,0,0], thread: [120,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. . . . . . . /opt/conda/conda-bld/pytorch_1556653114079/work/aten/src/ATen/native/cuda/IndexKernel.cu:53: lambda ->auto::operator()(int)->auto: block: [820,0,0], thread: [126,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1556653114079/work/aten/src/ATen/native/cuda/IndexKernel.cu:53: lambda ->auto::operator()(int)->auto: block: [820,0,0], thread: [127,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. Traceback (most recent call last): File “gtad_train.py”, line 129, in <module> test(test_loader, model, epoch, mask) File “gtad_train.py”, line 74, in test gt_iou_map = label_confidence.cuda() * bm_mask RuntimeError: CUDA error: device-side assert triggered

Please let me know what you think could be the problem.

Thanks, Srikanth

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:7 (2 by maintainers)

github_iconTop GitHub Comments

2reactions
coolbaycommented, Jun 22, 2020

Hi, @srikanth-sfu ! May I ask the batch size you use? If you use too small a batch size, it might happen that no positive samples exist in the batch. As a consequence, the memory error like this would occur.

1reaction
frostinassikycommented, Jun 23, 2020

@srikanth-sfu Thanks for your feedback.

To reproduce the errors you mentioned, could you specify the environment? What are the GPU type and CUDA version?

Read more comments on GitHub >

github_iconTop Results From Across the Web

COMMON ERRORS IN KRONOS AND HOW TO FIX THEM
TIP: Do a CTRL + Click on the topic to link to the error details ... Employees should leave the pay code box...
Read more >
Script Hook V Critical Error in GTA 5 [100% FIXED] - YouTube
ScriptHookV Critical is a common error encountered by the gamers while launching Grand theft auto V even seen in the offline mode and...
Read more >
Cardiac Arrest Symptoms - Runners Who Survived Sudden ...
Athletes—often young and seemingly healthy—die suddenly at races, during training runs, or in the off hours between them.
Read more >
The 6 Most Common Running Injuries (Plus How to Treat Them)
Training errors (increasing mileage or intensity too quickly). Poor running form. Increasing weekly mileage too quickly. Transitioning too ...
Read more >
Troubleshoot transfer configurations | BigQuery - Google Cloud
The following are common errors encountered when creating a Google Merchant Center transfer. Error: No data to transfer found for the Merchant account....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found