Replace NMS GPU kernel in object detection sample with torchvision's implementation
See original GitHub issueMake sure that inference speed in eval mode and accuracy is not significantly affected.
In case torchvision’s NMS impl is slower, clean up the existing GPU kernel by removing the empty wrapper in nms.cpp
Issue Analytics
- State:
- Created 3 years ago
- Comments:8 (8 by maintainers)
Top Results From Across the Web
Object Detection from 9 FPS to 650 FPS in 6 Steps
This article is a practical deep dive into making a specific deep learning model (Nvidia's SSD300) run fast on a powerful GPU server, ......
Read more >(Soft)NMS in Object Detection: PyTorch Implementation(with ...
In soft-NMS, a score is calculated by the product of confidence score and the negative of IoU. CUDA Kernel. __global__ void nms_kernel(const int ......
Read more >Add Mean Average Precision (mAP) metric · Issue #53 - GitHub
The main metric for object detection tasks is the Mean Average Precision, implemented in PyTorch, and computed on GPU.
Read more >Train your own object detector with Faster-RCNN & PyTorch
A guide to object detection with Faster-RCNN and PyTorch ... do not worry, you'll be able to implement your own training logic and...
Read more >Operators — Torchvision main documentation - PyTorch
The below operators perform pre-processing as well as post-processing required in object detection and segmentation models.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
There is no such big difference in both inference speed and accuracy Actually PyTorch impl even a little bit faster and more accurate: ssd300 PyTorch impl: Averaged detect for batch: 1.163s Mean AP = 0.7831 Old FB kernels: Averaged detect for batch: 1.192s Mean AP = 0.7828
ssd512 PyTorch impl: Averaged detect for batch: 0.819s Mean AP = 0.8044 Old FB kernels: Averaged detect for batch: 0.843s Mean AP = 0.8026
Custom NMS CUDA kernel from our extension uses top-k to cut a number of predictions. It’s aligned with implementation from OpenVINO: https://github.com/openvinotoolkit/openvino/blob/master/inference-engine/src/mkldnn_plugin/nodes/detectionoutput.cpp#L568
NMS from Torchvision doesn’t do that, and it seems we can’t cut a number of detections before and after function to make results equivalent to the NMS from our extension because the top-k is used internally on par with a full array of predictions.
Without the top-k parameter, the implementations are equivalent, so I believe it’s OK to keep the custom NMS GPU kernel.
@vanyalzr, please re-open the issue, if you don’t agree or have other ideas to discuss.