Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

GPU usage keeps increasing until OOM error on iSAID dataset.

See original GitHub issue

If you do not know the root cause of the problem / bug, and wish someone to help you, please include:

How To Reproduce the Issue

Run a simple training with any detectron2 backbone on iSAID dataset https://captain-whu.github.io/iSAID/. iSAID is a instance segmentation dataset with COCO-style json data, using 15 object categories and having some images with a big number of instances(cars). iSAID is preprocessed by the author’s script which convert labels bboxes and metadata to the COCO format, while creating 800x800 patches of the high resolution original images.

what changes you made (git diff) or what code you wrote I used the simple detectron2 colab tutorial code, using a register_coco_instances function instead of defining a custom function, as ISAID is fully compatible with COCO format. Here is a link to the code for reproducing the error: https://drive.google.com/open?id=1bo0GOhHLlvEyc6E9DOZzlszg59THOT9x
what exact command you run python3 training_naive.py, which runs a register_coco_instances function, a cfg setup, and then a simple DefaultTrainer.train()
what you observed (including the full logs):

The GPU memory usage keeps increasing after several iterations, until a crash for out of memory error. Using torch.cuda_empty_cache() or the suggested
cfg.MODEL.RPN.PRE_NMS_TOPK_TRAIN = 200
cfg.MODEL.RPN.POST_NMS_TOPK_TRAIN = 200
did not solved neither.
Here is the link to the full output from bash: [https://drive.google.com/open?id=1SszOAY9pEBFSsfp7nyc0Gv_mcCiHoAKo](url)

Expected behavior

If there are no obvious error in “what you observed” provided above, please tell us the expected behavior.

If you expect the model to work better, note that we do not help you train your model. Only in one of the two conditions we will help with it: (1) You’re unable to reproduce the results in detectron2 model zoo. (2) It indicates a detectron2 bug.

Environment

Please paste the output of python -m detectron2.utils.collect_env. If detectron2 hasn’t been successfully installed, use python detectron2/utils/collect_env.py.

(pytorch) paolo@ALCOR-TITANV-WS:~/libriaries/prove_detectron2$ python -m detectron2.utils.collect_env

sys.platform linux Python 3.6.8 (default, Oct 9 2019, 14:04:01) [GCC 5.4.0 20160609] Numpy 1.17.4 Detectron2 Compiler GCC 5.4 Detectron2 CUDA Compiler 10.1 DETECTRON2_ENV_MODULE <not set> PyTorch 1.3.1 PyTorch Debug Build False torchvision 0.4.2 CUDA available True GPU 0,1,2,3 TITAN V CUDA_HOME /usr/local/cuda-10.1 NVCC Cuda compilation tools, release 10.1, V10.1.105 Pillow 6.2.1 cv2 4.1.2

PyTorch built with:

GCC 7.3
Intel® Math Kernel Library Version 2019.0.4 Product Build 20190411 for Intel® 64 architecture applications
Intel® MKL-DNN v0.20.5 (Git Hash 0125f28c61c1f822fd48570b4c1066f96fcb9b2e)
OpenMP 201511 (a.k.a. OpenMP 4.5)
NNPACK is enabled
CUDA Runtime 10.1
NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_37,code=compute_37
CuDNN 7.6.3
Magma 2.5.1
Build settings: BLAS=MKL, BUILD_NAMEDTENSOR=OFF, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -fopenmp -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Wno-stringop-overflow, DISABLE_NUMA=1, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=True, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF,

(pytorch) paolo@ALCOR-TITANV-WS:~/libriaries/prove_detectron2$

Issue Analytics

State:
Created 4 years ago
Reactions:1
Comments:6

Top GitHub Comments

2reactions

engharatcommented, Dec 2, 2019

Update: the issue disappear if I remove the 100 images which show the highest number of instances. Top 100 images have 700 instances per image, with the top-10 images having 3000 instances. Still, PANnet official implementation, which is heavily based on detectron 1 code, is able to run on the whole dataset without any issue.

0reactions

maxiuwcommented, Jun 28, 2022

Similar problem while using collab and custom data set. Solved by tinkering with cfg settings.

cfg.merge_from_file(model_zoo.get_config_file("COCO-Detection/faster_rcnn_R_50_FPN_1x.yaml"))
cfg.DATASETS.TRAIN = ("unityDF1",)
cfg.DATASETS.TEST = ()
cfg.DATALOADER.NUM_WORKERS = 1
cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url("COCO-Detection/faster_rcnn_R_50_FPN_1x.yaml")  # Let training initialize from model zoo
cfg.SOLVER.IMS_PER_BATCH = 2  # This is the real "batch size" commonly known to deep learning people
cfg.SOLVER.BASE_LR = 0.00025  # pick a good LR
cfg.SOLVER.MAX_ITER = 300    # 300 iterations seems good enough for this toy dataset; you will need to train longer for a practical dataset
cfg.SOLVER.STEPS = []        # do not decay learning rate

BATCH_SIZE = 32
cfg.BATCH_SIZE_PER_IMAGE = BATCH_SIZE
cfg.MODEL.BATCH_SIZE_PER_IMAGE = BATCH_SIZE
cfg.MODEL.FPN.BATCH_SIZE_PER_IMAGE = BATCH_SIZE
cfg.MODEL.RPN.BATCH_SIZE_PER_IMAGE = BATCH_SIZE
cfg.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = BATCH_SIZE
cfg.DATALOADER.BATCH_SIZE_PER_IMAGE = BATCH_SIZE
cfg.SOLVER.BATCH_SIZE_PER_IMAGE = BATCH_SIZE
# 
cfg.MODEL.RPN.PRE_NMS_TOPK_TRAIN = 10
cfg.MODEL.RPN.POST_NMS_TOPK_TRAIN = 10
cfg.DATASETS.PRECOMPUTED_PROPOSAL_TOPK_TRAIN = 10
cfg.INPUT.MAX_SIZE_TRAIN = 32
cfg.INPUT.MAX_SIZE_TEST = 32
# cfg.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = 2   # The "RoIHead batch size". 128 is faster, and good enough for this toy dataset (default: 512)
cfg.MODEL.ROI_HEADS.NUM_CLASSES = 3

Top Results From Across the Web

Solving "CUDA out of memory" Error - Kaggle

1) Use this code to see memory usage (it requires internet to install package): ... import cuda def free_gpu_cache(): print("Initial GPU Usage") gpu_usage() ......

GPU OOM when training - Beginners - Hugging Face Forums

The memory usage on training begins at 12Gb, runs a few steps, and keeps growing until OOM error. It seems to be that...

Colab runs out of memory when training - PyTorch Forums

I am trying to train a CNN using PyTorch in Google Colab, however after around 170 batches Colab freezes because all available RAM...

Google Colaboratory: misleading information about its GPU ...

UPDATED: It turns out that I can use GPU normally even when the GPU RAM Free is 504 MB, which I thought as...

Introducing Low-Level GPU Virtual Memory Management

There is a growing need among CUDA applications to manage memory as quickly and as efficiently as possible. Before CUDA 10.2, the number...