Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Reproducibility Issue

See original GitHub issue

I have ran your codes 5 times in the below environment.

Two V100 GPUs
Python 3.6.7
PyTorch 1.0.0
Cuda 9.0

The command I used is this :

python train.py \
--net_type pyramidnet \
--dataset cifar100 \
--depth 200 \
--alpha 240 \
--batch_size 64 \
--lr 0.25 \
--expname PyraNet200 \
--epochs 300 \
--beta 1.0 \
--cutmix_prob 0.5 \
--no-verbose

For the baseline, I set cutmix_prob=0.0 not to use cutmix.

| Model & Augmentations | try1 | try2 | try3 | try4 | try5 | Average – | – | – | – | – | – | – | – cutmix p=0.0 | Pyramid200(Converged) | 17.14 | 16.32 | 16.15 | 16.29 | 16.61 | 16.502 | Pyramid200(Best) | 17.01 | 16.02 | 16.01 | 16.17 | 16.35 | 16.312 cutmix p=0.5 | CutMix(Converged) | 16.27 | 15.55 | 16.18 | 16.19 | 15.38 | 15.914 | CutMix(Best) | 15.29 | 14.66 | 15.28 | 15.04 | 14.52 | 14.958

The baseline has a similar top-1 accuracy as your paper said (16.45), but with cutmix(p=0.5), the result is somewhat poor compared to the reported value(14.23).

Also, I conducted an experiments with shakedrop (after codes for shakedrop regularization has been brought from ‘https://github.com/owruby/shake-drop_pytorch’).

| | try1 | try2 | try3 | try4 | try5 | Average – | – | – | – | – | – | – | – cutmix p=0.5 | ShakeDrop+CutMix(Converged) | 14.06 | 14 | 14.16 | 13.86 | 14 | 14.016 | ShakeDrop+CutMix(Best) | 13.67 | 13.81 | 13.8 | 13.69 | 13.62 | 13.718

Here you can see, top-1 accuracy you claimed on the paper can be achieved only by using ‘maximum top-1 validation accuracy’ during training, not by using ‘converged top-1 validation accuracy’ after training.

So, here is my questions.

How can I reproduce your result? Especially with your provided codes and sample commands, I should reproduce 14.23% of Top1 Accuracy with PyramidNet+Cutmix. It will be great if you can provide the specific environment and command to reproduce the result or this helps you to find some problems on this repo.
Did you use ‘last validation accuracy’ after training or ‘best validation accuracy(peak accuracy)’ while training? I saw some codes tracking the best validation accuracy while training and print out the value before terminating, so I assume that you used ‘best(peak) validation accuracy’.

Thanks. I look forward to hearing from you.

Issue Analytics

State:
Created 4 years ago
Comments:9

Top GitHub Comments

3reactions

hellbellcommented, Aug 13, 2019

@ildoonet Thank you for your reply. I do understand your concerns but I don’t agree that mentioning the best performance is cheating. As I said, the best model surely can be treated to represent the performance of the method. The difference between the best and the last model is coming from the step decaying learning rate. In our case of using cosine learning rate on CIFAR100, the best and last model is almost the same (within ± 0.1% acc). All the experiments we re-implemented are conducted in the same experiment setting, the best model is selected for every other method, so there is no cheating and fair-comparison issues. Our best model’s performance is not instantly peaked high value because we conducted several times and report the mean of the best performances.

1reaction

JiyueWangcommented, Nov 10, 2020

‘Cheating’ is a rather harsh word. However, comparing the peak value indeed benefits oscillating and risky methods.