Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Scalability of monailabel (OOM errors)

See original GitHub issue

Describe the bug I have encountered two different situations where monai label is using far more memory than I would expect. Are these user errors, or related to my dataset? Has monai label been designed with scalability in mind?

When I push train, my entire dataset is loaded into CPU RAM. Our dataset is larger than some of the competition datasets (BTCV or MSD) but not extremely large - roughly 100 CT scans that are 512x512xH, where H is usually in the range of about 500. Uncompressed, that adds up to nearly 100GB, which leads to the program crashing. Is there an option to avoid loading all data into RAM, and just load it on demand? Perhaps using pre-fetch to avoid creating a bottleneck? Since I am using the segmentation model, which trains on patches, perhaps it would be sufficient to load just the patches into RAM, rather than the full images?

In case it is relevant, my dataset has 12 foreground labels.

My work around solution is to use swap, but obviously that’s not ideal.

After training, clicking RUN gives me another OOM error. I tried decreasing the roi_size for my model, but even at 64x64x64 I’m still exceeding the 8GB of GPU VRAM available:

For 128x128x128

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 8.25 GiB (GPU 0; 7.92 GiB total capacity; 440.46 MiB already allocated; 6.63 GiB free; 610.00 MiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

For 96x96x96

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 7.62 GiB (GPU 0; 7.92 GiB total capacity; 1.20 GiB already allocated; 5.46 GiB free; 1.77 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

For 64x64x64

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 7.62 GiB (GPU 0; 7.92 GiB total capacity; 1.20 GiB already allocated; 5.46 GiB free; 1.77 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

It might be expected behaviour for deepedit models to cause an OOM, since they run on the full image. However, I expected that a segmentation model would scale to arbitrary sized images, because it analyses the image in patches. Have I misunderstood something? Or is the stitching of the patches also carried out in the GPU?

To Reproduce Steps to reproduce the behavior:

Get hold of a medium sized dataset with ground truth labels. Put them in a folder structure as expected by monai. Hold back the ground truth labels for at least one for step 6.
Make a copy of the radiology/lib/config/segmentation.py file (e.g. segmentation_custom.py) and modify the foreground classes and roi_size.
Run the monailabel app:

monailabel start_server --app radiology --studies relative/path/to/images --conf models segmentation_custom --conf use_pretrained_model false

In Slicer, connect to the server and click Train.
If you have enough CPU RAM and training completes, click Next Sample to get an unlabelled image and then Run to automatically generate labels.

Expected behavior I expected to be able to train a network and run inference on a dataset with an arbitrary number of arbitrarily sized images.

I’ve used 128x128x128 patches with nnunet, and been able to run inference on GPUs with only 4GB of VRAM. I’m surprised that an 8GB GPU gets an OOM when trying to run the segmentation network with 64x64x64 patches.

8GB of GPU memory was enough to train the network, so I assumed it would also be enough to run inference.

Screenshots N/A

Environment

Ensuring you use the relevant python executable, please paste the output of:

python -c 'import monai; monai.config.print_debug_info()'

================================
Printing MONAI config...
================================
MONAI version: 1.0.1
Numpy version: 1.23.4
Pytorch version: 1.13.0+cu117
MONAI flags: HAS_EXT = False, USE_COMPILED = False, USE_META_DICT = False
MONAI rev id: 8271a193229fe4437026185e218d5b06f7c8ce69
MONAI __file__: /home/chris/Software/monai/venv/lib/python3.8/site-packages/monai/__init__.py

Optional dependencies:
Pytorch Ignite version: 0.4.10
Nibabel version: 4.0.2
scikit-image version: 0.19.3
Pillow version: 9.3.0
Tensorboard version: 2.11.0
gdown version: 4.5.3
TorchVision version: 0.14.0+cu117
tqdm version: 4.64.1
lmdb version: 1.3.0
psutil version: 5.9.4
pandas version: NOT INSTALLED or UNKNOWN VERSION.
einops version: 0.6.0
transformers version: NOT INSTALLED or UNKNOWN VERSION.
mlflow version: NOT INSTALLED or UNKNOWN VERSION.
pynrrd version: 0.4.3

For details about installing the optional dependencies, please visit:
    https://docs.monai.io/en/latest/installation.html#installing-the-recommended-dependencies


================================
Printing system config...
================================
System: Linux
Linux version: Ubuntu 20.04.5 LTS
Platform: Linux-5.14.0-1054-oem-x86_64-with-glibc2.29
Processor: x86_64
Machine: x86_64
Python version: 3.8.10
Process name: python
Command: ['python', '-c', 'import monai; monai.config.print_debug_info()']
Open files: []
Num physical CPUs: 6
Num logical CPUs: 12
Num usable CPUs: 12
CPU usage (%): [16.5, 22.2, 15.4, 25.0, 20.9, 82.1, 13.4, 10.5, 12.3, 12.3, 13.9, 15.2]
CPU freq. (MHz): 1579
Load avg. in last 1, 5, 15 mins (%): [11.6, 10.4, 26.0]
Disk usage (%): 81.0
Avg. sensor temp. (Celsius): UNKNOWN for given OS
Total physical memory (GB): 31.0
Available memory (GB): 28.3
Used memory (GB): 2.2

================================
Printing GPU config...
================================
Num GPUs: 1
Has CUDA: True
CUDA version: 11.7
cuDNN enabled: True
cuDNN version: 8500
Current device: 0
Library compiled for CUDA architectures: ['sm_37', 'sm_50', 'sm_60', 'sm_70', 'sm_75', 'sm_80', 'sm_86']
GPU 0 Name: NVIDIA GeForce GTX 1080
GPU 0 Is integrated: False
GPU 0 Is multi GPU board: False
GPU 0 Multi processor count: 20
GPU 0 Total memory (GB): 7.9
GPU 0 CUDA capability (maj.min): 6.1

Additional context N/A

Issue Analytics

State:
Created 10 months ago
Comments:8

Top GitHub Comments

1reaction

chrisrapsoncommented, Dec 6, 2022

Ah right, I probably should have thought to look more closely at the error messages. One line above the OOM message was this line:

.../venv/lib/python3.8/site-packages/monai/inferers/utils.py", line 219, in sliding_window_inference
    output_image_list.append(torch.zeros(output_shape, dtype=compute_dtype, device=device))

I managed to follow that back to the inferer() method in radiology/lib/infers/segmentation.py, and added an argument to the call to SlidingWindowInferer(roi_size=self.roi_size, device=torch.device('cpu')).

Unfortunately, that only postponed the OOM error until the post transforms. Again, following your advice I was able to read through the stack trace to discover that it was the EnsureType type conversion which was trying to load the full image back into GPU memory. I was able to modify that line, and now it runs 😄

            EnsureTyped(keys="pred", device=torch.device('cpu') if data else None),

Thanks for your help!! This lets me run inference, and with my swap workaround I can run training too. Is it worth considering turning this into a feature request for more defensive programming? Perhaps using torch.device('cpu') for these operations unless the user explicitly enables GPU, or if the image size is guaranteed to fit in GPU memory? For nnunet, there is a command line argument --all-in-gpu that serves this purpose. Having it disabled by default removes one potential source of problems.

0reactions

SachidanandAllecommented, Dec 14, 2022

you can calculate the dump size of PersistentDataset… you might notice a difference only if the dump is reasonable… persistence cached saved into your model/xyz/train_xy folder… cache or .cache…

and also you can dig details to see how much is loaded into GPU vs CPU… if your pre-transform is cached after loading data into GPU… the corresponding tensor gets saved on the disk… some mem profilers can help to know a bit more.

i understand on supporting gpu vs non-gpu enforcement in some of the examples. it can be a good config… and for your segmentation_xxx model, you can do something like this…

    def train_pre_transforms(self, context: Context):
        t = [
            LoadImaged(keys=("image", "label")),
            EnsureChannelFirstd(keys=("image", "label")),
            Spacingd(
                keys=("image", "label"),
                pixdim=(1.0, 1.0, 1.0),
                mode=("bilinear", "nearest"),
            ),
            ScaleIntensityRanged(keys="image", a_min=-57, a_max=164, b_min=0.0, b_max=1.0, clip=True),
            CropForegroundd(keys=("image", "label"), source_key="image"),
        ]

        if context.request.get("all-in-gpu", False):
          t.append(EnsureTyped(keys=("image", "label"), device=context.device))

        t.extend([
            RandCropByPosNegLabeld(
                keys=("image", "label"),
                label_key="label",
                spatial_size=(96, 96, 96),
                pos=1,
                neg=1,
                num_samples=4,
                image_key="image",
                image_threshold=0,
            ),
            RandShiftIntensityd(keys="image", offsets=0.1, prob=0.5),
            SelectItemsd(keys=("image", "label")),
        ])

Top Results From Across the Web

Issues · Project-MONAI/MONAILabel - GitHub

MONAI Label is an intelligent open source image labeling and learning tool. - Issues · Project-MONAI/MONAILabel. ... Scalability of monailabel (OOM errors).

Monai label. CUDA out of memory - 3D Slicer Community

I try to start the inference with the batch size equal to 1. Input tensor is 512x512 with 512 images. But I'm stuck...

Monai Label Helps Quickly Create Annotated Datasets and AI ...

MONAI Label is an intelligent open-source image labeling and learning tool that reduces the time and effort of annotating new datasets and enables...

Installation — MONAI Label 0.2.0 Documentation

MONAI Label supports following OS with GPU/CUDA enabled. Windows¶. Make sure you have python 3.x version environment with PyTorch and CUDA installed. Install ......

MONAILabel - bytemeta

MONAILabel repo issues. ... Scalability of monailabel (OOM errors). Isabelle-Liu. Isabelle-Liu CLOSED · Updated 3 days ago ...