question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Method to fill holes in segmentations

See original GitHub issue

Is your feature request related to a problem? Please describe.

We have 3D segmentation masks. Every segmented shape is not supposed to have holes within its borders. Any wholes might be considered potential artifacts. Especially for our training data we would like to close such holes.

For an example, the following matrix:

[
  [
    [ 1, 1, 1, 2, 2, 2 ],
    [ 1, 1, 1, 2, 2, 2 ],
    [ 1, 1, 1, 2, 2, 2 ],
    [ 0, 3, 0, 0, 4, 0 ],
    [ 3, 3, 3, 4, 0, 4 ],
    [ 0, 3, 0, 0, 4, 0 ],
  ],
  [
    [ 1, 1, 1, 2, 2, 2 ],
    [ 1, 0, 1, 2, 0, 0 ],
    [ 1, 1, 1, 2, 2, 2 ],
    [ 0, 3, 0, 0, 4, 0 ],
    [ 3, 0, 3, 4, 0, 4 ],
    [ 0, 3, 0, 0, 4, 0 ],
  ],
  [
    [ 1, 1, 1, 2, 2, 2 ],
    [ 1, 1, 1, 2, 2, 2 ],
    [ 1, 1, 1, 2, 2, 2 ],
    [ 0, 3, 0, 0, 4, 0 ],
    [ 3, 3, 3, 4, 4, 4 ],
    [ 0, 3, 0, 0, 4, 0 ],
  ],
]

Should result in

[
  [
    [ 1, 1, 1, 2, 2, 2 ],
    [ 1, 1, 1, 2, 2, 2 ],
    [ 1, 1, 1, 2, 2, 2 ],
    [ 0, 3, 0, 0, 4, 0 ],
    [ 3, 3, 3, 4, 0, 4 ],
    [ 0, 3, 0, 0, 4, 0 ],
  ],
  [
    [ 1, 1, 1, 2, 2, 2 ],
    [ 1, 1, 1, 2, 0, 0 ],
    [ 1, 1, 1, 2, 2, 2 ],
    [ 0, 3, 0, 0, 4, 0 ],
    [ 3, 3, 3, 4, 0, 4 ],
    [ 0, 3, 0, 0, 4, 0 ],
  ],
  [
    [ 1, 1, 1, 2, 2, 2 ],
    [ 1, 1, 1, 2, 2, 2 ],
    [ 1, 1, 1, 2, 2, 2 ],
    [ 0, 3, 0, 0, 4, 0 ],
    [ 3, 3, 3, 4, 4, 4 ],
    [ 0, 3, 0, 0, 4, 0 ],
  ],
]

The only filled holes are the 1 and 3 in the middle slice. The 2 shape is open to the side and the 4 is open to the back. The 0 between the classes should stay untouched.

Describe the solution you’d like

MONAI is missing a “fill_holes” function / transformation. scipy has an implementation which works for 3D data but only for binary images (scipy.ndimage.morphology.binary_fill_holes). This requires an iteration over all labels of the image which makes this already slow method unacceptably slow.

I wish that MONAI would implement such a function.

Describe alternatives you’ve considered

I opened this feature request on the scipy project: https://github.com/scipy/scipy/issues/14504 And this stackoverflow question: https://stackoverflow.com/questions/68608749/performant-way-to-fill-holes-for-categorical-data

I implemented 7 versions using the existing scipy.ndimage.morphology.binary_fill_holes function (or its implementation) and numpy. Here the two best versions so far:

import numpy as np
from scipy.ndimage.morphology import binary_fill_holes

def fill_holes6(img: np.ndarray, applied_labels: np.ndarray) -> np.ndarray:
    output = np.zeros_like(img)
    for i in applied_labels:
        output[binary_fill_holes(img == i)] = i

    return output

def fill_holes7(img: np.ndarray, applied_labels: np.ndarray) -> np.ndarray:
    output = np.zeros(img.shape, dtype=int)
    for i in applied_labels:
        tmp = np.zeros(img.shape, dtype=bool)
        binary_dilation(tmp, structure=None, iterations=-1, mask=img != i, origin=0, border_value=1, output=tmp)
        output[np.logical_not(tmp)] = i
        
    return output

In MONAI this could be implemented something like this:

class FillHoles():

    @classmethod
    def _binary_dilation(cls, img: torch.Tensor) -> torch.Tensor:
        img_arr = img.detach().cpu().numpy()
        img_arr = binary_fill_holes(img_arr)
        return torch.as_tensor(img_arr, device=img.device).type(torch.uint8)

    def __init__(self, applied_labels: Union[Sequence[int]]) -> None:
        self.applied_labels = applied_labels

    def __call__(self, img: torch.Tensor,
                 meta_data: Dict[str, Any]) -> Tuple[torch.Tensor, Dict[str, Any]]:
       output = torch.zeros_like(img)
       for i in applied_labels:
          output[self._binary_fill_holes(img == i)] = i

       return output, meta_data

I measured the performance the following way (matching my real world data distribution):

import time
import pandas as pd

def measure(funs, t):
    res = []
    for _ in range(t):
        ra = np.random.randint(10, 40)
        sh = np.random.randint(200, 400, 3)
        img = np.random.randint(0, ra, sh)

        applied_labels = np.unique(img)[1:]

        fun_res = []
        for fun in funs:
            start = time.time()
            fun(img, applied_labels)
            end = time.time()
            fun_res.append(end - start)
        res.append(fun_res)
    return np.min(res, axis=0), np.max(res, axis=0), np.mean(res, axis=0), np.std(res, axis=0)

print(measure([fill_holes6, fill_holes7], t=10))

For my first implementations I got the following execution times (t=100):

fill_holes1 fill_holes2 fill_holes3
min 6.4s 6.9s 6.2s
max 83.7s 96.0s 80.4s
mean 32.9s 37.3s 31.6s
std 17.3s 20.1s 16.5

This is very slow. The last implementation fill_holes7 is only 1.27 times faster than fill_holes3. I really hope there is a more performant way of doing this.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Reactions:1
  • Comments:18 (18 by maintainers)

github_iconTop GitHub Comments

2reactions
Spenhouetcommented, Aug 2, 2021

@wyli Good catch. I take back my statement with respect to kornia. I did not work with it before and just now tried their dilation and erosion implementations and it seems they can not handle 3D data.

I did yet only work with the binary_dilation and binary_erosion functions of scipy which have the limitation of not working for multiple classes. I was about to check out the grey_dilation and grey_erosion methods.

if you are looking for offline preprocessing

Yes, we are. Thanks for the hint. I will take a look.

I’m happy to provide a PR for this feature but I would like to implement something that is performant first.

1reaction
Spenhouetcommented, Aug 4, 2021

I would choose whichever correct implementation is fastest for a PR

Instead of benchmarking with random data, I did load one of our brain segmentations (256 x 256 x 256) and did run both methods (fill_holes7 and fill_holes8) with it.

fill_holes7 fill_holes8
69s 276s

So even for real world data, growing the background from the edge for every label is the “faster” solution (less slow). But honestly, waiting 70s for a single image…

There is also the option to go into one more dimension by one-hot encoding but this will blow up memory usage too much. That is why I did not consider that solution. It might be an option to do this in a “paging” type of way by creating partial one-hot encoded matrices, remapping labels, … but this highly depends on the real data and then becomes complicated quickly (memory management).

I did now implement the PR with the fill_holes8 solution since @Nic-Ma preferred it. It probably makes sense to use the fill_holes7 solution instead. But I agree that this needs a faster implementation…

Read more comments on GitHub >

github_iconTop Results From Across the Web

A novel hole filling method using image segmentation-based ...
To solve the problem, we propose a novel hole filling method using image segmentation-based image in-painting. Our hole filling method shows not only...
Read more >
A Method for Filling Holes in Objects of Medical Images Using ...
ABSTRACT—This paper presents a new method for filling holes in objects of binary images. ... simple segmentation techniques such as thresholding. They.
Read more >
Fill holes after threshold-based bone segmentation in CT
Hi all, I am segmenting the geometries of femur, tibia, fibula and patella from a CT scan using Threshold effect.
Read more >
Hole Filling in Images
A simple way of segmenting out the foreground objects is to project the foreground mesh onto the camera images and mark out the...
Read more >
Fill Holes - AnalyzeDirect
The Fill Holes function performs a flood‡fill operation on the selected object. ... After the initial segmentation, an object may contain holes, which...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found