Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

GPU out of memory when using `SupervisedTrainer` in a loop

See original GitHub issue

Describe the bug Constructing and running a SupervisedTrainer in a loop eventually leads to GPU out of memory. See below example

To Reproduce

import segmentation_models_pytorch as smp
import torch
from torch import optim, nn
from monai.engines import SupervisedTrainer
from monai.data import DataLoader, ArrayDataset
import gc


NETWORK_INPUT_SHAPE = (1, 256, 256)
NUM_IMAGES = 50

def get_xy():
    xs = [256 * torch.rand(NETWORK_INPUT_SHAPE) for _ in range(NUM_IMAGES)]
    ys = [torch.rand(NETWORK_INPUT_SHAPE) for _ in range(NUM_IMAGES)]
    return xs, ys


def get_data_loader():
    x, y = get_xy()
    dataset = ArrayDataset(x, seg=y)
    loader = DataLoader(dataset, batch_size=16)
    return loader


def get_model():
    return smp.Unet(
        encoder_weights="imagenet", in_channels=1, classes=2, activation=None
    )

if __name__ == "__main__":
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    train_loader = get_data_loader()
    model = get_model()

    for i in range(50):
        print(f"On iteration {i}")

        model.to(device)
        optimizer = optim.Adam(model.parameters())

        trainer = SupervisedTrainer(
            device=device,
            max_epochs=10,
            train_data_loader=train_loader,
            network=model,
            optimizer=optimizer,
            loss_function=nn.CrossEntropyLoss(),
            prepare_batch=lambda batchdata, device, non_blocking: (
                batchdata[0].to(device),
                batchdata[1].squeeze(1).to(device, dtype=torch.long),
            ),
        )

        trainer.run()
        # gc.collect()

Around the 4th iteration or so, I get RuntimeError: CUDA out of memory. If this doesn’t happen for anyone trying out the example, try increasing the NUM_IMAGES variable or the number of iterations of the loop. I know that there are a few common causes for out of memory issues in pytorch, outlined here, but I can’t really find where I’m doing any of these things. I’ve tried calling del trainer and moving the initialization of the model inside the loop and deleting it afterwards. Calling gc.collect() works, which makes me think that there is some kind of circular reference holding up the garbage collection. I’m not convinced that this isn’t user error, though.

Environment ubuntu 18.04, python 3.8

Ensuring you use the relevant python executable, please paste the output of:

python -c 'import monai; monai.config.print_debug_info()'

================================ Printing MONAI config…

MONAI version: 0.8.0 Numpy version: 1.21.2 Pytorch version: 1.10.0+cu102 MONAI flags: HAS_EXT = False, USE_COMPILED = False MONAI rev id: 714d00dffe6653e21260160666c4c201ab66511b

Optional dependencies: Pytorch Ignite version: NOT INSTALLED or UNKNOWN VERSION. Nibabel version: NOT INSTALLED or UNKNOWN VERSION. scikit-image version: 0.18.3 Pillow version: 8.4.0 Tensorboard version: NOT INSTALLED or UNKNOWN VERSION. gdown version: NOT INSTALLED or UNKNOWN VERSION. TorchVision version: 0.10.1+cu102 tqdm version: 4.62.3 lmdb version: NOT INSTALLED or UNKNOWN VERSION. psutil version: NOT INSTALLED or UNKNOWN VERSION. pandas version: 1.3.3 einops version: NOT INSTALLED or UNKNOWN VERSION. transformers version: NOT INSTALLED or UNKNOWN VERSION. mlflow version: NOT INSTALLED or UNKNOWN VERSION.

For details about installing the optional dependencies, please visit: https://docs.monai.io/en/latest/installation.html#installing-the-recommended-dependencies

================================ Printing system config…

psutil required for print_system_info

================================ Printing GPU config…

Num GPUs: 1 Has CUDA: True CUDA version: 10.2 cuDNN enabled: True cuDNN version: 7605 Current device: 0 Library compiled for CUDA architectures: [‘sm_37’, ‘sm_50’, ‘sm_60’, ‘sm_70’] GPU 0 Name: Quadro T2000 GPU 0 Is integrated: False GPU 0 Is multi GPU board: False GPU 0 Multi processor count: 16 GPU 0 Total memory (GB): 3.8 GPU 0 CUDA capability (maj.min): 7.5

Additional context Originally used for k-fold cross validation.

Issue Analytics

State:
Created 2 years ago
Comments:9 (5 by maintainers)

Top GitHub Comments

1reaction

tao558commented, Dec 2, 2021

This is all in the context of k-fold cross validation. So I need to make a new optimizer for each fold, a new trainer to run the training, and run the trainer for each fold. Moving the optimizer outside of the loop causes a memory leak, anyway.

0reactions

Nic-Macommented, Dec 13, 2021

Hi @tao558 ,

I think maybe you can leverage something in this tutorial: https://github.com/Project-MONAI/tutorials/blob/master/modules/cross_validation_models_ensemble.ipynb

Thanks.

Top Results From Across the Web

CUDA out of memory when using Trainer with compute_metrics

This error means you are trying to get predictions that just don't fit in RAM, so there is nothing Trainer can do to...

trax-ml/community - Gitter

I'm trying to run reformer_memory_test.py on an 8GB GPU. Upon the call to model.init , the tf reserves almost all GPU memory (about...

Speed Up Model Training - PyTorch Lightning - Read the Docs

When training on single or multiple GPU machines, Lightning offers a host of advanced optimizations to improve throughput, memory efficiency, and model ...

Estimating GPU memory consumption of deep learning models

... In our research, we encountered lots of GPU Out Of Memory (OOM) issues, and we managed to propose several possible solutions to...

Efficient Use of GPU Memory for Large-Scale Deep Learning ...

However, due to the limitations of GPU memory, it is difficult to train large-scale training models within a single GPU. NVIDIA introduced a...