Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Ops to convert `masks` to `boxes`

See original GitHub issue

🚀 Feature

A simple torchvision.ops to convert Segmentation masks to bounding boxes.

Motivation

This has a few use-cases.

This makes it easier to use semantic segmentation datasets for object detection. The pipeline can be easier. Also the bounding boxes are represented as xyxy in torchvision.ops as a convention. So probably convert masks to xyxy format.
The other use case is to make it easier in comparing performance of segmentation model vs detection model. Let’s Say that the detection model performs well for segmentation dataset. Then it would be better to go ahead with detection models as it is faster in real-time use-cases than to train a segmentation model.

New Pipeline

from torchvision.ops import masks_to_boxes, box_convert

class SegmentationToDetectionDataset(Dataset):
    def __getitem__(self, idx):
          boxes_xyxy = masks_to_boxes(segmentation_masks)

         # Now for any change of boxes to COCO Format.
          boxes_xywh = box_convert(boxes_xyxy, in_fmt="xyxy", out_fmt="xywh")
          return boxes_xywh

Pitch

Port the masks_to_boxes function from mDeTR.

masks_to_boxes was also used in DeTR.

Alternatives

The above function assumes masks of shape (N, H, W) -> num_masks, Height, Width. A floating tensor. IIRC, we used a boolean tensor in draw_segmentation_masks (After Nicolas refactored). So perhaps we should be using boolean tensor? Though I see no particular use case of this util being only valid for instance segmentation.

Additional context

I can port this, we perhaps need a few tests to ensure it works fine. Especially test for float16 overflow.

cc @datumbox @NicolasHug

Issue Analytics

State:
Created 2 years ago
Comments:20 (16 by maintainers)

Top GitHub Comments

2reactions

0x00b1commented, Aug 18, 2021

@oke-aditya Great! I’ll send one this afternoon. I’ll include a gallery example.

1reaction

addisonklinkecommented, Jan 27, 2022

@syed-javed Yes I’ve got one working now. The strategy is to iterate through each (x, y) location where there’s a positive (i.e confidence > threshold) prediction. From those locations, iteratively expand outwards as long as each boundary edge has an average confidence greater than the threshold. Ignore points that overlap with a previously created box to speed up the iteration

With the function below, you can reproduce my desired output. Please note my input tensor is slightly different, specifically torch.FloatTensor[H, W] instead of torch.BoolTensor[N, H, W]. Also the return is a tuple of (boxes, scores) where scores is the average confidence of each region

boxes, scores = heatmap_to_bboxes(masks.squeeze().float())
# boxes: [[0, 1, 1, 2], [2, 3, 3, 4]]]
# scores: [[1, 1]]

The function

from copy import deepcopy
import torch
from torchvision.ops import batched_nms


def heatmap_to_bboxes(heatmap, pos_thres=0.5, nms_thres=0.5, score_thres=0.5):
    """Cluster heatmap into discrete bounding boxes

    :param torch.Tensor[H, W] heatmap: Predicted probabilities
    :param float pos_thres: Threshold for assigning probability to positive class
    :param Optional[float] nms_thres: Threshold for non-max suppression (or ``None`` to skip)
    :param Optional[float] score_thres: Threshold for final bbox scores (or ``None`` to skip)
    :return Tuple[torch.Tensor]: Containing
        * bboxes[N, C=4]: bounding box coordinates in ltrb format
        * scores[N]: confidence scores (averaged across all pixels in the box)
    """

    def get_roi(data, bounds):
        """Extract region of interest from a tensor

        :param torch.Tensor[H, W] data: Original data
        :param dict bounds: With keys for left, right, top, and bottom
        :return torch.Tensor[H', W']: Subset of the original data
        """
        compound_slice = (
            slice(bounds['top'], bounds['bottom']),
            slice(bounds['left'], bounds['right']))
        return data[compound_slice]

    def is_covered(x, y, bbox):
        """Determine whether a point is covered/inside a bounding box

        :param int x: Point x-coordinate
        :param int y: Point y-coordinate
        :param torch.Tensor[int(4)] bbox: In ltrb format
        :return bool: Whether all boundaries are satisfied
        """
        left, top, right, bottom = bbox
        bounds = [
            x >= left,
            x <= right,
            y >= top,
            y <= bottom]
        return all(bounds)

    # Determine indices of each positive pixel
    heatmap_bin = torch.where(heatmap > pos_thres, 1, 0)
    mask = torch.ones(heatmap.size()).type_as(heatmap)
    idxs = torch.flip(torch.nonzero(heatmap_bin*mask), [1])
    heatmap_height, heatmap_width = heatmap.shape

    # Limit potential expansion to the heatmap boundaries
    edge_names = ['left', 'top', 'right', 'bottom']
    limits = {
        'left': 0,
        'top': 0,
        'right': heatmap_width,
        'bottom': heatmap_height}
    bboxes = []
    scores = []

    # Iterate over positive pixels
    for x, y in idxs:

        # Skip if an existing bbox already covers this point
        already_covered = False
        for bbox in bboxes:
            if is_covered(x, y, bbox):
                already_covered = True
                break
        if already_covered:
            continue

        # Start by looking 1 row/column in every direction and iteratively expand the ROI from there
        incrementers = {k: 1 for k in edge_names}
        max_bounds = {
            'left': deepcopy(x),
            'top': deepcopy(y),
            'right': deepcopy(x),
            'bottom': deepcopy(y)}
        while True:

            # Extract the new, expanded ROI around the current (x, y) point
            bounds = {
                'left': max(limits['left'], x - incrementers['left']),
                'top': max(limits['top'], y - incrementers['top']),
                'right': min(limits['right'], x + incrementers['right'] + 1),
                'bottom': min(limits['bottom'], y + incrementers['bottom'] + 1)}
            roi = get_roi(heatmap_bin, bounds)

            # Get the vectors along each edge
            edges = {
                'left': roi[:, 0],
                'top': roi[0, :],
                'right': roi[:, -1],
                'bottom': roi[-1, :]}

            # Continue if at least one new edge has more than ``pos_thres`` percent positive elements
            # Also check whether ROI has reached the heatmap boundary
            keep_going = False
            for k, v in edges.items():
                if v.sum()/v.numel() > pos_thres and limits[k] != max_bounds[k]:
                    keep_going = True
                    max_bounds[k] = bounds[k]
                    incrementers[k] += 1

            # If none of the newly expanded edges were useful
            # Then convert the maximum ROI to bbox and calculate its confidence
            # Single pixel islands are ignored since they have zero width/height
            if not keep_going:
                final_roi = get_roi(heatmap, max_bounds)
                if final_roi.numel() > 0:
                    bboxes.append([max_bounds[k] - 1 if i > 1 else max_bounds[k] 
                                   for i, k in enumerate(edge_names)])
                    scores.append(final_roi.mean())
                break

    # Type conversions and optional NMS + score filtering
    bboxes = torch.tensor(bboxes).type_as(heatmap)
    scores = torch.tensor(scores).type_as(heatmap)
    if nms_thres is not None:
        class_idxs = torch.zeros(bboxes.shape[0])
        keep_idxs = batched_nms(bboxes.float(), scores, class_idxs, iou_threshold=nms_thres)
        bboxes = bboxes[keep_idxs]
        scores = scores[keep_idxs]
    if score_thres is not None:
        high_confid = scores > score_thres
        bboxes = bboxes[high_confid]
        scores = scores[high_confid]
    return bboxes, scores

Top Results From Across the Web

Repurposing masks into bounding boxes - PyTorch

To convert the boolean masks into bounding boxes. We will use the masks_to_boxes() from the torchvision.ops module It returns the boxes in (xmin,...

Computing Bounding Boxes from a Mask-Image (Tensorflow ...

I'm looking for ways to convert a mask (a Height x Width boolean image) into a series of bounding boxes (see example picture...

From masks to bounding boxes - Kaggle

So a unique operator will give us the unique filenames that contain ships. In order to extract the bounding box we: Load mask...

Generating Masks From Boxes by Mining Spatio-Temporal ...

Ideally, simply converting the video. 13556. Page 2. box annotations to object masks would allow existing video segmentation approaches to integrate these ...

detectron2.structures

static convert (box: Union[List[float], Tuple[float, …] ... It stores the attributes of instances (e.g., boxes, masks, labels, scores) as “fields”.