Batch size, GPU number, performance without pretrained model for ADE20k training?
See original GitHub issueFirst I want to thank you for your work! When I try to reproduce the result of ADE20k, I find the issue of out of memory under two RTX 2080Ti. Could you provide the batch size and also the GPU number for ADE20k training?
I did not find the batch size and GPU number you used for all the experiments in your ArXiv version paper, which I think should be critical for reproduction and to validate your paper’s conclusion.
-
Questions Highlight:
- What’s your batch size setting for the ADE20k and VOC training?
- What GPU did you use for training and testing?
- Could you provide the training time for your experiments?
- Could you provide the final performance without using the pretrained model from Inplace-ABN repo?
- Could you provide the performance of the step-0 for MiB under VOC and ADE training with pretrained model?
-
Exp Command: command:
CUDA_VISIBLE_DEVICES=6,7 python -m torch.distributed.launch --nproc_per_node=2 run.py --data_root data --batch_size 12 --dataset ade --name test_MIB_ade_100_50_lr_0.01_no_pretrained --task 100-50 --lr 0.01 --epochs 30 --method MiB --no_pretrained
-
Error Log:
CUDA_VISIBLE_DEVICES=6,7 python -m torch.distributed.launch --nproc_per_node=2 run.py --data_root data --batch_size 12 --dataset ade --name test_MIB_ade_100_50_lr_0.01_no_pretrained --task 100-50 --lr 0.01 --epochs 30 --method MiB --no_pretrained
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
*****************************************
INFO:rank1: Device: cuda:1
Filtering images...
INFO:rank0: [!] starting logging at directory ./logs/100-50-ade/test_MIB_ade_100_50_lr_0.01_no_pretrained/
INFO:rank0: Device: cuda:0
0/2000 ...
Filtering images...
0/2000 ...
1000/2000 ...
1000/2000 ...
Filtering images...
0/2000 ...
Filtering images...
0/2000 ...
1000/2000 ...
1000/2000 ...
INFO:rank0: Dataset: ade, Train set: 13452, Val set: 2000, Test set: 2000, n_classes 101
INFO:rank0: Total batch size is 24
INFO:rank0: Backbone: resnet101
INFO:rank0: [!] Model made without pre-trained
Selected optimization level O0: Pure FP32 training.
Defaults for this optimization level are:
enabled : True
opt_level : O0
cast_model_type : torch.float32
patch_torch_functions : False
keep_batchnorm_fp32 : None
master_weights : False
loss_scale : 1.0
Processing user overrides (additional kwargs that are not None)...
After processing overrides, optimization options are:
enabled : True
opt_level : O0
cast_model_type : torch.float32
patch_torch_functions : False
keep_batchnorm_fp32 : None
master_weights : Falseloss_scale : 1.0INFO:rank0: [!] Train from scratch
INFO:rank0: tensor([[50]])
INFO:rank1: tensor([[50]])
INFO:rank0: Epoch 0, lr = 0.010000
Traceback (most recent call last):
File "run.py", line 390, in <module>
main(opts)
File "run.py", line 277, in main
train_loader=train_loader, scheduler=scheduler, logger=logger)
File "/home/jovyan/MiB/train.py", line 128, in train
scaled_loss.backward()
File "/opt/conda/envs/MiB/lib/python3.6/site-packages/torch/tensor.py", line 118, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/opt/conda/envs/MiB/lib/python3.6/site-packages/torch/autograd/__init__.py", line 93, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: CUDA out of memory. Tried to allocate 1.18 GiB (GPU 1; 10.73 GiB total capacity; 9.31 GiB already allocated; 371.56 MiB free; 243.46 MiB cached)
Traceback (most recent call last):
File "run.py", line 390, in <module>
main(opts)
File "run.py", line 277, in main
train_loader=train_loader, scheduler=scheduler, logger=logger)
File "/home/jovyan/MiB/train.py", line 128, in train
scaled_loss.backward()
File "/opt/conda/envs/MiB/lib/python3.6/site-packages/torch/tensor.py", line 118, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/opt/conda/envs/MiB/lib/python3.6/site-packages/torch/autograd/__init__.py", line 93, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: CUDA out of memory. Tried to allocate 1.18 GiB (GPU 0; 10.73 GiB total capacity; 9.31 GiB already allocated; 373.56 MiB free; 241.46 MiB cached)
Traceback (most recent call last):
File "/opt/conda/envs/MiB/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/opt/conda/envs/MiB/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/opt/conda/envs/MiB/lib/python3.6/site-packages/torch/distributed/launch.py", line 246, in <module>
main()
File "/opt/conda/envs/MiB/lib/python3.6/site-packages/torch/distributed/launch.py", line 242, in main
cmd=cmd)
subprocess.CalledProcessError: Command '['/opt/conda/envs/MiB/bin/python', '-u', 'run.py', '--local_rank=1', '--data_root', 'data', '--batch_size', '12', '--dataset', 'ade', '--name', 'test_MIB_ade_100_50_lr_0.01_no_pretrained', '--task', '100-50', '--lr', '0.01', '--epochs', '30', '--method', 'MiB', '--no_pretrained']' returned non-zero exit status 1.
Issue Analytics
- State:
- Created 3 years ago
- Comments:27 (12 by maintainers)
Top Results From Across the Web
Setting a Baseline for Image Segmentation Speedups
By the end, we demonstrate a DeepLabv3+ baseline on ADE20k with +1.4 mean ... PyTorch pre-trained weights and increasing the batch size for ......
Read more >Semantic Understanding of Scenes Through the ... - Tete Xiao
Our experiments show that a reasonably large batch size is essential for matching the high- est score of the-state-or-the-art models, while a small...
Read more >2. Dive Deep into Training with CIFAR10
As we all know, training deep neural networks on GPUs is way faster than training ... Batch Size for Each GPU per_device_batch_size =...
Read more >Semantic Understanding of Scenes through the ADE20K ...
sonably large batch size is crucial for the semantic segmen- tation performance. We show that the networks trained on. ADE20K are able to ......
Read more >Understanding GauGAN Part 2: Training on Custom Datasets
The batch size generally depends upon how large an image you are trying to synthesise. ... Currently Nvidia provides pre-trained models for COCO-stuff, ......
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Hi @wuyujack ! i., ii.)The batch size is the same for all experiments and it is 24. Since I used 2 Titan RTX GPUs, I used a batch size of 12 on each (same setup on train/test).
iii.) Training time is harder to estimate since it depends on the setting. Let’s say we are using the 100-50 on ADE20K, step 0 was nearly 12 hours, while step 1 was nearly 7 hours. (looking at your command, I saw you used 30 epochs for ADE, but as written in the paper, we used 60 epochs for that dataset). Regarding Pascal-VOC and 15-5 setting, the training time was around 6 hours for step 0 and 40 minutes for step 1. Here we used 30 epochs.
iv.) I have no results to show regarding a model without the pretrained, sorry.
v.) VOC step 0 was: 19-1: 78.7 ± 0.8 mIoU 15-5: 80.4 ± 0.8 mIoU
ADE step 0 was (on Order A): 100-50: 42.6 ± 0.5 mIoU 50-50-50: 48.5 ± 0.5 mIoU
Hope it helps.
Hi @wuyujack, These are all the numbers I got. Hope it’s helpful 😃