Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Batch size, GPU number, performance without pretrained model for ADE20k training?

See original GitHub issue

First I want to thank you for your work! When I try to reproduce the result of ADE20k, I find the issue of out of memory under two RTX 2080Ti. Could you provide the batch size and also the GPU number for ADE20k training?

I did not find the batch size and GPU number you used for all the experiments in your ArXiv version paper, which I think should be critical for reproduction and to validate your paper’s conclusion.

Questions Highlight:
1. What’s your batch size setting for the ADE20k and VOC training?
2. What GPU did you use for training and testing?
3. Could you provide the training time for your experiments?
4. Could you provide the final performance without using the pretrained model from Inplace-ABN repo?
5. Could you provide the performance of the step-0 for MiB under VOC and ADE training with pretrained model?
Exp Command: command: CUDA_VISIBLE_DEVICES=6,7 python -m torch.distributed.launch --nproc_per_node=2 run.py --data_root data --batch_size 12 --dataset ade --name test_MIB_ade_100_50_lr_0.01_no_pretrained --task 100-50 --lr 0.01 --epochs 30 --method MiB --no_pretrained
Error Log:

CUDA_VISIBLE_DEVICES=6,7 python -m torch.distributed.launch --nproc_per_node=2 run.py --data_root data --batch_size 12 --dataset ade --name test_MIB_ade_100_50_lr_0.01_no_pretrained --task 100-50 --lr 0.01 --epochs 30 --method MiB --no_pretrained

*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
*****************************************
INFO:rank1: Device: cuda:1
Filtering images...
INFO:rank0: [!] starting logging at directory ./logs/100-50-ade/test_MIB_ade_100_50_lr_0.01_no_pretrained/
INFO:rank0: Device: cuda:0
        0/2000 ...
Filtering images...
        0/2000 ...
        1000/2000 ...
        1000/2000 ...
Filtering images...
        0/2000 ...
Filtering images...
        0/2000 ...
        1000/2000 ...
        1000/2000 ...
INFO:rank0: Dataset: ade, Train set: 13452, Val set: 2000, Test set: 2000, n_classes 101
INFO:rank0: Total batch size is 24
INFO:rank0: Backbone: resnet101
INFO:rank0: [!] Model made without pre-trained
Selected optimization level O0:  Pure FP32 training.

Defaults for this optimization level are:
enabled                : True
opt_level              : O0
cast_model_type        : torch.float32
patch_torch_functions  : False
keep_batchnorm_fp32    : None
master_weights         : False
loss_scale             : 1.0
Processing user overrides (additional kwargs that are not None)...
After processing overrides, optimization options are:
enabled                : True
opt_level              : O0
cast_model_type        : torch.float32
patch_torch_functions  : False
keep_batchnorm_fp32    : None
master_weights         : Falseloss_scale             : 1.0INFO:rank0: [!] Train from scratch
INFO:rank0: tensor([[50]])
INFO:rank1: tensor([[50]])
INFO:rank0: Epoch 0, lr = 0.010000
Traceback (most recent call last):
  File "run.py", line 390, in <module>
    main(opts)
  File "run.py", line 277, in main
    train_loader=train_loader, scheduler=scheduler, logger=logger)
  File "/home/jovyan/MiB/train.py", line 128, in train
    scaled_loss.backward()
  File "/opt/conda/envs/MiB/lib/python3.6/site-packages/torch/tensor.py", line 118, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/opt/conda/envs/MiB/lib/python3.6/site-packages/torch/autograd/__init__.py", line 93, in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: CUDA out of memory. Tried to allocate 1.18 GiB (GPU 1; 10.73 GiB total capacity; 9.31 GiB already allocated; 371.56 MiB free; 243.46 MiB cached)
Traceback (most recent call last):
  File "run.py", line 390, in <module>
    main(opts)
  File "run.py", line 277, in main
    train_loader=train_loader, scheduler=scheduler, logger=logger)
  File "/home/jovyan/MiB/train.py", line 128, in train
    scaled_loss.backward()
  File "/opt/conda/envs/MiB/lib/python3.6/site-packages/torch/tensor.py", line 118, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/opt/conda/envs/MiB/lib/python3.6/site-packages/torch/autograd/__init__.py", line 93, in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: CUDA out of memory. Tried to allocate 1.18 GiB (GPU 0; 10.73 GiB total capacity; 9.31 GiB already allocated; 373.56 MiB free; 241.46 MiB cached)
Traceback (most recent call last):
  File "/opt/conda/envs/MiB/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/opt/conda/envs/MiB/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/opt/conda/envs/MiB/lib/python3.6/site-packages/torch/distributed/launch.py", line 246, in <module>
    main()
  File "/opt/conda/envs/MiB/lib/python3.6/site-packages/torch/distributed/launch.py", line 242, in main
    cmd=cmd)
subprocess.CalledProcessError: Command '['/opt/conda/envs/MiB/bin/python', '-u', 'run.py', '--local_rank=1', '--data_root', 'data', '--batch_size', '12', '--dataset', 'ade', '--name', 'test_MIB_ade_100_50_lr_0.01_no_pretrained', '--task', '100-50', '--lr', '0.01', '--epochs', '30', '--method', 'MiB', '--no_pretrained']' returned non-zero exit status 1.

Issue Analytics

State:
Created 3 years ago
Comments:27 (12 by maintainers)

Top GitHub Comments

1reaction

fcdl94commented, Aug 18, 2020

Hi @wuyujack ! i., ii.)The batch size is the same for all experiments and it is 24. Since I used 2 Titan RTX GPUs, I used a batch size of 12 on each (same setup on train/test).

iii.) Training time is harder to estimate since it depends on the setting. Let’s say we are using the 100-50 on ADE20K, step 0 was nearly 12 hours, while step 1 was nearly 7 hours. (looking at your command, I saw you used 30 epochs for ADE, but as written in the paper, we used 60 epochs for that dataset). Regarding Pascal-VOC and 15-5 setting, the training time was around 6 hours for step 0 and 40 minutes for step 1. Here we used 30 epochs.

iv.) I have no results to show regarding a model without the pretrained, sorry.

v.) VOC step 0 was: 19-1: 78.7 ± 0.8 mIoU 15-5: 80.4 ± 0.8 mIoU

ADE step 0 was (on Order A): 100-50: 42.6 ± 0.5 mIoU 50-50-50: 48.5 ± 0.5 mIoU

Hope it helps.

0reactions

fcdl94commented, Nov 6, 2020

Hi @wuyujack, These are all the numbers I got. Hope it’s helpful 😃

Top Results From Across the Web

Setting a Baseline for Image Segmentation Speedups

By the end, we demonstrate a DeepLabv3+ baseline on ADE20k with +1.4 mean ... PyTorch pre-trained weights and increasing the batch size for ......

Semantic Understanding of Scenes Through the ... - Tete Xiao

Our experiments show that a reasonably large batch size is essential for matching the high- est score of the-state-or-the-art models, while a small...

2. Dive Deep into Training with CIFAR10

As we all know, training deep neural networks on GPUs is way faster than training ... Batch Size for Each GPU per_device_batch_size =...

Semantic Understanding of Scenes through the ADE20K ...

sonably large batch size is crucial for the semantic segmen- tation performance. We show that the networks trained on. ADE20K are able to ......

Understanding GauGAN Part 2: Training on Custom Datasets

The batch size generally depends upon how large an image you are trying to synthesise. ... Currently Nvidia provides pre-trained models for COCO-stuff, ......