Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Race Condition -> RuntimeError: cuda runtime error (77) : an illegal memory access was encountered at /maskrcnn-benchmark/maskrcnn_benchmark/csrc/cuda/nms.cu:103

See original GitHub issue

🐛 Bug

When my maskrcnn’s parameters requires_grad is False, I get an error in the inference stage due to nms. This didn’t happen in last month’s version of this repo.

To Reproduce

I run the following command for a CityScapes batch.

for p in mrcnn.parameters():
     p.requires_grad = False

mrcnn.eval()
predictions = mrcnn(image_list)

Error that I get,

    result = self.forward(*input, **kwargs)
  File "/mnt/home/issam/Research_Ground/domain_adaptation/models/base_models/adv_mrcnn.py", line 242, in forward
    x, result, detector_losses = self.roi_heads(features, proposals, targets)
  File "/miniconda/envs/py36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/mnt/home/issam/Research_Ground/domain_adaptation/models/maskrcnn/heads.py", line 26, in forward
    x, detections, loss_box = self.box(features, proposals, targets)
  File "/miniconda/envs/py36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/maskrcnn-benchmark/maskrcnn_benchmark/modeling/roi_heads/box_head/box_head.py", line 52, in forward
    result = self.post_processor((class_logits, box_regression), proposals)
  File "/miniconda/envs/py36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/maskrcnn-benchmark/maskrcnn_benchmark/modeling/roi_heads/box_head/inference.py", line 82, in forward
    boxlist = self.filter_results(boxlist, num_classes)
  File "/maskrcnn-benchmark/maskrcnn_benchmark/modeling/roi_heads/box_head/inference.py", line 126, in filter_results
    boxlist_for_class, self.nms
  File "/maskrcnn-benchmark/maskrcnn_benchmark/structures/boxlist_ops.py", line 27, in boxlist_nms 
    keep = _box_nms(boxes, score, nms_thresh)
RuntimeError: cuda runtime error (77) : an illegal memory access was encountered at /maskrcnn-benchmark/maskrcnn_benchmark/csrc/cuda/nms.cu:103
Uncaught exception. Entering post mortem debugging
Running 'cont' or 'step' will restart the program
> /maskrcnn-benchmark/maskrcnn_benchmark/structures/boxlist_ops.py(27)boxlist_nms()
     26     score = boxlist.get_field(score_field)
---> 27     keep = _box_nms(boxes, score, nms_thresh)
     28     if max_proposals > 0:

Environment

Numpy: 1.15.4 CUDA: 9.0.176 Pytroch: 1.0.1.post2 CUDNN: 7402

Issue Analytics

State:
Created 5 years ago
Comments:14 (3 by maintainers)

Top GitHub Comments

3reactions

ruiyuanlucommented, Apr 26, 2019

Hi, I met the same issue. I found that nms.cu initialize its threads on GPU-0 by default, and will raise such illegal memory access error when input tensor boxes is on other GPUs. Any idea to fix this ?

1reaction

ruiyuanlucommented, Apr 26, 2019

In my case, the issue was that I was using python multithreading to run two threads, one for training and one for validation. I fixed this by using multiprocessing instead of multithreading.

Are you running multiple threads?

No. Actually, I tried to compile only nms.cu, no more code included, and found this issue in single thread and single process.

Top Results From Across the Web

Cuda illegal memory access when running inference on ...

After exporting a YoloV5 model to .engine I receive an error when trying to perform inference on it.

Code gives cuda runtime error (77) for second iteration

I'm having the same problem. ..THCStorage.c line=32 error=77 : an illegal memory access was encountered ..luajit: cuda runtime error (77) : an illegal...

What is a Race Condition? - TechTarget

In computer memory or storage, a race condition may occur if commands to read and write a large amount of data are received...

PyTorch RuntimeError: CUDA error: an illegal memory access ...

I had the same issue before when my code tried to multiply tensors on different device. torch.mul tried to multiple tensors on CPU...

"an illegal memory access was encountered launching kernel ...

Re: [AMBER] vlimit=10 compromise for Amber 20 error: "an illegal memory access was encountered launching kernel kClearForces"? This message : [ ...