Ops to convert `masks` to `boxes`
See original GitHub issueš Feature
A simple torchvision.ops
to convert Segmentation masks to bounding boxes.
Motivation
This has a few use-cases.
-
This makes it easier to use semantic segmentation datasets for object detection. The pipeline can be easier. Also the bounding boxes are represented as
xyxy
intorchvision.ops
as a convention. So probably convert masks toxyxy
format. -
The other use case is to make it easier in comparing performance of segmentation model vs detection model. Letās Say that the detection model performs well for segmentation dataset. Then it would be better to go ahead with detection models as it is faster in real-time use-cases than to train a segmentation model.
New Pipeline
from torchvision.ops import masks_to_boxes, box_convert
class SegmentationToDetectionDataset(Dataset):
def __getitem__(self, idx):
boxes_xyxy = masks_to_boxes(segmentation_masks)
# Now for any change of boxes to COCO Format.
boxes_xywh = box_convert(boxes_xyxy, in_fmt="xyxy", out_fmt="xywh")
return boxes_xywh
Pitch
Port the masks_to_boxes function from mDeTR.
masks_to_boxes was also used in DeTR.
Alternatives
The above function assumes masks of shape (N, H, W)
-> num_masks, Height, Width
. A floating tensor.
IIRC, we used a boolean tensor in draw_segmentation_masks
(After Nicolas refactored). So perhaps we should be using boolean tensor? Though I see no particular use case of this util being only valid for instance segmentation.
Additional context
I can port this, we perhaps need a few tests to ensure it works fine. Especially test for float16 overflow.
Issue Analytics
- State:
- Created 2 years ago
- Comments:20 (16 by maintainers)
@oke-aditya Great! Iāll send one this afternoon. Iāll include a gallery example.
@syed-javed Yes Iāve got one working now. The strategy is to iterate through each
(x, y)
location where thereās a positive (i.e confidence > threshold) prediction. From those locations, iteratively expand outwards as long as each boundary edge has an average confidence greater than the threshold. Ignore points that overlap with a previously created box to speed up the iterationWith the function below, you can reproduce my desired output. Please note my input tensor is slightly different, specifically
torch.FloatTensor[H, W]
instead oftorch.BoolTensor[N, H, W]
. Also the return is a tuple of(boxes, scores)
wherescores
is the average confidence of each regionThe function