FastNMS on Ultralytics YOLOv3
See original GitHub issue@dbolya we’ve had a request from @Zzh-tju to implement FastNMS in https://github.com/ultralytics/yolov3 per https://github.com/ultralytics/yolov3/issues/679. Can you point us to the location in your code where the function is? Can we use it for boxes rather than masks?
We currently use multi-class torchvision.ops.boxes.batched_nms()
(middle row) as a compromise between speed and accuracy. We apply it once per image (all classes at once), and see an inference speed of 49ms/img (inference + NMS) at 608 image size, conf_thresh=0.001
on a Tesla T4, giving us about 42.0/62.0 mAP@0.5/0.5…0.95 on COCO2014. We do not do masks though, only boxes.
BTW, we also developed the merge
nms method below, which is slower simply because it is implemented in python rather than C, but it may be possible to combine fast
and merge
together to get the best of both worlds.
NMS method | Time s/img |
Time mm:ss |
mAP @0.5:0.95 |
mAP @0.5 |
---|---|---|---|---|
'vision_batched', multi_cls=False |
46ms | 3:50 | 41.2 | 60.8 |
'vision_batched', multi_cls=True |
49ms | 4:03 | 41.9 | 61.8 |
'merge', multi_cls=True |
120ms | 9:58 | 42.3 | 62.0 |
Issue Analytics
- State:
- Created 4 years ago
- Comments:21 (15 by maintainers)
Top GitHub Comments
Looking at
batched_nms
, it looks like what we callcross_class
NMS, but I’m not sure what that would makemulti_cls=True
.Anyway, here’s our implementation of Fast NMS: https://github.com/dbolya/yolact/blob/092554ad707c2749631dfe545c8a953b2b3f4a68/layers/functions/detection.py#L137-L180
It works on boxes, so you can just ignore the mask stuff. The relevant inputs are
boxes ([N, 4])
andscores ([N, num_classes])
. The inputs and outputs should both be on the GPU (or whatever your fastest device is, and make sure nothing ever touches the CPU in this function), and we pass in all detections with > 0.05 confidence, but I don’t think passing everything in will hurt performance much since we take the top 200 anyway. Also, read the big comment about the second threshold.Most of the code is setup and postprocessing, the core of the algorithm is actually just:
which is what’s in the paper.
@Gaondong see https://github.com/ultralytics/yolov3/issues/679#issuecomment-604164825
I used this code for Matrix (Soft) NMS: