question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Reproduce 15-1 setup on Pascal VOC

See original GitHub issue

Describe the bug I tried to run the provided pascal VOC script using Apex optimization 01 and everything same as script except i was using a single GPU and hence changed the batch size to 24. But I got the following results

1-15 16-20 all
Paper 65.12 21.11 54.64
Code results 58.73 21.6 49.7

To Reproduce start=date +%s`

START_DATE=$(date ‘+%Y-%m-%d’)

PORT=$((9000 + RANDOM % 1000)) GPU=0 NB_GPU=1 DATA_ROOT=./data DATASET=voc TASK=15-5s NAME=PLOP METHOD=PLOP BATCH_SIZE=24 INITIAL_EPOCHS=30 EPOCHS=30 OPTIONS=“–checkpoint checkpoints/step/”

RESULTSFILE=results/${START_DATE}${DATASET}${TASK}_${NAME}.csv rm -f ${RESULTSFILE}

CUDA_VISIBLE_DEVICES=${GPU} python3 -m torch.distributed.launch --master_port ${PORT} --nproc_per_node=${NB_GPU} run.py --date ${START_DATE} --data_root ${DATA_ROOT} --overlap --batch_size ${BATCH_SIZE} --dataset ${DATASET} --name ${NAME} --task ${TASK} --step 0 --lr 0.01 --epochs ${INITIAL_EPOCHS} --method ${METHOD} --opt_level O1 ${OPTIONS} for step in 1 2 3 4 5 do CUDA_VISIBLE_DEVICES=${GPU} python3 -m torch.distributed.launch --master_port ${PORT} --nproc_per_node=${NB_GPU} run.py --date ${START_DATE} --data_root ${DATA_ROOT} --overlap --batch_size ${BATCH_SIZE} --dataset ${DATASET} --name ${NAME} --task ${TASK} --step ${step} --lr 0.001 --epochs ${EPOCHS} --method ${METHOD} --opt_level O1 ${OPTIONS} done python3 average_csv.py ${RESULTSFILE}`

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

2reactions
arthurdouillardcommented, Mar 1, 2022

Don’t hesitatee to reopen this issue if you have new findings. Best,

1reaction
arthurdouillardcommented, Nov 24, 2021

I think the problem comes from either:

  • how gradients are accumulated with multiple GPUs vs a single GPU, maybe you need to tune the learning rate for a single GPU
  • does the asyncBN work differently depending on the number of GPUs?
Read more comments on GitHub >

github_iconTop Results From Across the Web

Prepare PASCAL VOC datasets
Pascal VOC is a collection of datasets for object detection. The most commonly combination for benchmarking is using 2007 trainval and 2012 trainval...
Read more >
The PASCAL Visual Object Classes (VOC) Challenge
The objectives of the VOC challenge are twofold: first to provide challenging images and high quality annotation, together with a standard evaluation ...
Read more >
Part 1 Object Detection using RCNN on Pascal VOC2012
Step 1: Download PASCAL VOC2012 data​​ Data can be downloaded by visiting Visual Object Classes Challenge 2012 (VOC2012), and click Download the ...
Read more >
The PASCAL Visual Object Classes Homepage
The PASCAL VOC project: · Provides standardised image data sets for object class recognition · Provides a common set of tools for accessing...
Read more >
PASCAL VOC Object Classification - GitHub
PASCAL VOC Object Classification: The goal of this project is to recognize objects from a number of visual object classes in realistic scenes....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found