question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Race Condition -> RuntimeError: cuda runtime error (77) : an illegal memory access was encountered at /maskrcnn-benchmark/maskrcnn_benchmark/csrc/cuda/nms.cu:103

See original GitHub issue

🐛 Bug

When my maskrcnn’s parameters requires_grad is False, I get an error in the inference stage due to nms. This didn’t happen in last month’s version of this repo.

To Reproduce

I run the following command for a CityScapes batch.

for p in mrcnn.parameters():
     p.requires_grad = False

mrcnn.eval()
predictions = mrcnn(image_list)

Error that I get,

    result = self.forward(*input, **kwargs)
  File "/mnt/home/issam/Research_Ground/domain_adaptation/models/base_models/adv_mrcnn.py", line 242, in forward
    x, result, detector_losses = self.roi_heads(features, proposals, targets)
  File "/miniconda/envs/py36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/mnt/home/issam/Research_Ground/domain_adaptation/models/maskrcnn/heads.py", line 26, in forward
    x, detections, loss_box = self.box(features, proposals, targets)
  File "/miniconda/envs/py36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/maskrcnn-benchmark/maskrcnn_benchmark/modeling/roi_heads/box_head/box_head.py", line 52, in forward
    result = self.post_processor((class_logits, box_regression), proposals)
  File "/miniconda/envs/py36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/maskrcnn-benchmark/maskrcnn_benchmark/modeling/roi_heads/box_head/inference.py", line 82, in forward
    boxlist = self.filter_results(boxlist, num_classes)
  File "/maskrcnn-benchmark/maskrcnn_benchmark/modeling/roi_heads/box_head/inference.py", line 126, in filter_results
    boxlist_for_class, self.nms
  File "/maskrcnn-benchmark/maskrcnn_benchmark/structures/boxlist_ops.py", line 27, in boxlist_nms 
    keep = _box_nms(boxes, score, nms_thresh)
RuntimeError: cuda runtime error (77) : an illegal memory access was encountered at /maskrcnn-benchmark/maskrcnn_benchmark/csrc/cuda/nms.cu:103
Uncaught exception. Entering post mortem debugging
Running 'cont' or 'step' will restart the program
> /maskrcnn-benchmark/maskrcnn_benchmark/structures/boxlist_ops.py(27)boxlist_nms()
     26     score = boxlist.get_field(score_field)
---> 27     keep = _box_nms(boxes, score, nms_thresh)
     28     if max_proposals > 0:

Environment

Numpy: 1.15.4 CUDA: 9.0.176 Pytroch: 1.0.1.post2 CUDNN: 7402

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:14 (3 by maintainers)

github_iconTop GitHub Comments

3reactions
ruiyuanlucommented, Apr 26, 2019

Hi, I met the same issue. I found that nms.cu initialize its threads on GPU-0 by default, and will raise such illegal memory access error when input tensor boxes is on other GPUs. Any idea to fix this ?

1reaction
ruiyuanlucommented, Apr 26, 2019

In my case, the issue was that I was using python multithreading to run two threads, one for training and one for validation. I fixed this by using multiprocessing instead of multithreading.

Are you running multiple threads?

No. Actually, I tried to compile only nms.cu, no more code included, and found this issue in single thread and single process.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Cuda illegal memory access when running inference on ...
After exporting a YoloV5 model to .engine I receive an error when trying to perform inference on it.
Read more >
Code gives cuda runtime error (77) for second iteration
I'm having the same problem. ..THCStorage.c line=32 error=77 : an illegal memory access was encountered ..luajit: cuda runtime error (77) : an illegal...
Read more >
What is a Race Condition? - TechTarget
In computer memory or storage, a race condition may occur if commands to read and write a large amount of data are received...
Read more >
PyTorch RuntimeError: CUDA error: an illegal memory access ...
I had the same issue before when my code tried to multiply tensors on different device. torch.mul tried to multiple tensors on CPU...
Read more >
"an illegal memory access was encountered launching kernel ...
Re: [AMBER] vlimit=10 compromise for Amber 20 error: "an illegal memory access was encountered launching kernel kClearForces"? This message : [ ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found