Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Reproduce 15-1 setup on Pascal VOC

See original GitHub issue

Describe the bug I tried to run the provided pascal VOC script using Apex optimization 01 and everything same as script except i was using a single GPU and hence changed the batch size to 24. But I got the following results

	1-15	16-20	all
Paper	65.12	21.11	54.64
Code results	58.73	21.6	49.7

To Reproduce start=date +%s`

START_DATE=$(date ‘+%Y-%m-%d’)

PORT=$((9000 + RANDOM % 1000)) GPU=0 NB_GPU=1 DATA_ROOT=./data DATASET=voc TASK=15-5s NAME=PLOP METHOD=PLOP BATCH_SIZE=24 INITIAL_EPOCHS=30 EPOCHS=30 OPTIONS=“–checkpoint checkpoints/step/”

RESULTSFILE=results/${START_DATE}${DATASET}${TASK}_${NAME}.csv rm -f ${RESULTSFILE}

CUDA_VISIBLE_DEVICES=${GPU} python3 -m torch.distributed.launch --master_port ${PORT} --nproc_per_node=${NB_GPU} run.py --date ${START_DATE} --data_root ${DATA_ROOT} --overlap --batch_size ${BATCH_SIZE} --dataset ${DATASET} --name ${NAME} --task ${TASK} --step 0 --lr 0.01 --epochs ${INITIAL_EPOCHS} --method ${METHOD} --opt_level O1 ${OPTIONS} for step in 1 2 3 4 5 do CUDA_VISIBLE_DEVICES=${GPU} python3 -m torch.distributed.launch --master_port ${PORT} --nproc_per_node=${NB_GPU} run.py --date ${START_DATE} --data_root ${DATA_ROOT} --overlap --batch_size ${BATCH_SIZE} --dataset ${DATASET} --name ${NAME} --task ${TASK} --step ${step} --lr 0.001 --epochs ${EPOCHS} --method ${METHOD} --opt_level O1 ${OPTIONS} done python3 average_csv.py ${RESULTSFILE}`

Issue Analytics

State:
Created 2 years ago
Comments:5 (3 by maintainers)

Top GitHub Comments

2reactions

arthurdouillardcommented, Mar 1, 2022

Don’t hesitatee to reopen this issue if you have new findings. Best,

1reaction

arthurdouillardcommented, Nov 24, 2021

I think the problem comes from either:

how gradients are accumulated with multiple GPUs vs a single GPU, maybe you need to tune the learning rate for a single GPU
does the asyncBN work differently depending on the number of GPUs?

Top Results From Across the Web

Prepare PASCAL VOC datasets

Pascal VOC is a collection of datasets for object detection. The most commonly combination for benchmarking is using 2007 trainval and 2012 trainval...

The PASCAL Visual Object Classes (VOC) Challenge

The objectives of the VOC challenge are twofold: first to provide challenging images and high quality annotation, together with a standard evaluation ...

Part 1 Object Detection using RCNN on Pascal VOC2012

Step 1: Download PASCAL VOC2012 data Data can be downloaded by visiting Visual Object Classes Challenge 2012 (VOC2012), and click Download the ...

The PASCAL Visual Object Classes Homepage

The PASCAL VOC project: · Provides standardised image data sets for object class recognition · Provides a common set of tools for accessing...

PASCAL VOC Object Classification - GitHub

PASCAL VOC Object Classification: The goal of this project is to recognize objects from a number of visual object classes in realistic scenes....