Race Condition -> RuntimeError: cuda runtime error (77) : an illegal memory access was encountered at /maskrcnn-benchmark/maskrcnn_benchmark/csrc/cuda/nms.cu:103
See original GitHub issue🐛 Bug
When my maskrcnn’s parameters requires_grad
is False, I get an error in the inference stage due to nms
. This didn’t happen in last month’s version of this repo.
To Reproduce
I run the following command for a CityScapes batch.
for p in mrcnn.parameters():
p.requires_grad = False
mrcnn.eval()
predictions = mrcnn(image_list)
Error that I get,
result = self.forward(*input, **kwargs)
File "/mnt/home/issam/Research_Ground/domain_adaptation/models/base_models/adv_mrcnn.py", line 242, in forward
x, result, detector_losses = self.roi_heads(features, proposals, targets)
File "/miniconda/envs/py36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
result = self.forward(*input, **kwargs)
File "/mnt/home/issam/Research_Ground/domain_adaptation/models/maskrcnn/heads.py", line 26, in forward
x, detections, loss_box = self.box(features, proposals, targets)
File "/miniconda/envs/py36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
result = self.forward(*input, **kwargs)
File "/maskrcnn-benchmark/maskrcnn_benchmark/modeling/roi_heads/box_head/box_head.py", line 52, in forward
result = self.post_processor((class_logits, box_regression), proposals)
File "/miniconda/envs/py36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
result = self.forward(*input, **kwargs)
File "/maskrcnn-benchmark/maskrcnn_benchmark/modeling/roi_heads/box_head/inference.py", line 82, in forward
boxlist = self.filter_results(boxlist, num_classes)
File "/maskrcnn-benchmark/maskrcnn_benchmark/modeling/roi_heads/box_head/inference.py", line 126, in filter_results
boxlist_for_class, self.nms
File "/maskrcnn-benchmark/maskrcnn_benchmark/structures/boxlist_ops.py", line 27, in boxlist_nms
keep = _box_nms(boxes, score, nms_thresh)
RuntimeError: cuda runtime error (77) : an illegal memory access was encountered at /maskrcnn-benchmark/maskrcnn_benchmark/csrc/cuda/nms.cu:103
Uncaught exception. Entering post mortem debugging
Running 'cont' or 'step' will restart the program
> /maskrcnn-benchmark/maskrcnn_benchmark/structures/boxlist_ops.py(27)boxlist_nms()
26 score = boxlist.get_field(score_field)
---> 27 keep = _box_nms(boxes, score, nms_thresh)
28 if max_proposals > 0:
Environment
Numpy: 1.15.4 CUDA: 9.0.176 Pytroch: 1.0.1.post2 CUDNN: 7402
Issue Analytics
- State:
- Created 5 years ago
- Comments:14 (3 by maintainers)
Top Results From Across the Web
Cuda illegal memory access when running inference on ...
After exporting a YoloV5 model to .engine I receive an error when trying to perform inference on it.
Read more >Code gives cuda runtime error (77) for second iteration
I'm having the same problem. ..THCStorage.c line=32 error=77 : an illegal memory access was encountered ..luajit: cuda runtime error (77) : an illegal...
Read more >What is a Race Condition? - TechTarget
In computer memory or storage, a race condition may occur if commands to read and write a large amount of data are received...
Read more >PyTorch RuntimeError: CUDA error: an illegal memory access ...
I had the same issue before when my code tried to multiply tensors on different device. torch.mul tried to multiple tensors on CPU...
Read more >"an illegal memory access was encountered launching kernel ...
Re: [AMBER] vlimit=10 compromise for Amber 20 error: "an illegal memory access was encountered launching kernel kClearForces"? This message : [ ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Hi, I met the same issue. I found that nms.cu initialize its threads on GPU-0 by default, and will raise such illegal memory access error when input tensor
boxes
is on other GPUs. Any idea to fix this ?No. Actually, I tried to compile only nms.cu, no more code included, and found this issue in single thread and single process.