question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Replace NMS GPU kernel in object detection sample with torchvision's implementation

See original GitHub issue

Make sure that inference speed in eval mode and accuracy is not significantly affected.

In case torchvision’s NMS impl is slower, clean up the existing GPU kernel by removing the empty wrapper in nms.cpp

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:8 (8 by maintainers)

github_iconTop GitHub Comments

2reactions
SKholkincommented, Nov 18, 2020

There is no such big difference in both inference speed and accuracy Actually PyTorch impl even a little bit faster and more accurate: ssd300 PyTorch impl: Averaged detect for batch: 1.163s Mean AP = 0.7831 Old FB kernels: Averaged detect for batch: 1.192s Mean AP = 0.7828

ssd512 PyTorch impl: Averaged detect for batch: 0.819s Mean AP = 0.8044 Old FB kernels: Averaged detect for batch: 0.843s Mean AP = 0.8026

0reactions
ljaljushkincommented, Dec 7, 2020

Custom NMS CUDA kernel from our extension uses top-k to cut a number of predictions. It’s aligned with implementation from OpenVINO: https://github.com/openvinotoolkit/openvino/blob/master/inference-engine/src/mkldnn_plugin/nodes/detectionoutput.cpp#L568

NMS from Torchvision doesn’t do that, and it seems we can’t cut a number of detections before and after function to make results equivalent to the NMS from our extension because the top-k is used internally on par with a full array of predictions.

int boxes_num = std::min(boxes.size(0), top_k);
scalar_t* boxes_dev = boxes_sorted.data<scalar_t>();
nms_kernel<<<blocks, threads>>>(boxes_num,
                                nms_overlap_thresh,
                                boxes_dev,

Without the top-k parameter, the implementations are equivalent, so I believe it’s OK to keep the custom NMS GPU kernel.

@vanyalzr, please re-open the issue, if you don’t agree or have other ideas to discuss.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Object Detection from 9 FPS to 650 FPS in 6 Steps
This article is a practical deep dive into making a specific deep learning model (Nvidia's SSD300) run fast on a powerful GPU server, ......
Read more >
(Soft)NMS in Object Detection: PyTorch Implementation(with ...
In soft-NMS, a score is calculated by the product of confidence score and the negative of IoU. CUDA Kernel. __global__ void nms_kernel(const int ......
Read more >
Add Mean Average Precision (mAP) metric · Issue #53 - GitHub
The main metric for object detection tasks is the Mean Average Precision, implemented in PyTorch, and computed on GPU.
Read more >
Train your own object detector with Faster-RCNN & PyTorch
A guide to object detection with Faster-RCNN and PyTorch ... do not worry, you'll be able to implement your own training logic and...
Read more >
Operators — Torchvision main documentation - PyTorch
The below operators perform pre-processing as well as post-processing required in object detection and segmentation models.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found