question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Reproducibility Issue

See original GitHub issue

I have ran your codes 5 times in the below environment.

Two V100 GPUs
Python 3.6.7
PyTorch 1.0.0
Cuda 9.0

The command I used is this :

python train.py \
--net_type pyramidnet \
--dataset cifar100 \
--depth 200 \
--alpha 240 \
--batch_size 64 \
--lr 0.25 \
--expname PyraNet200 \
--epochs 300 \
--beta 1.0 \
--cutmix_prob 0.5 \
--no-verbose

For the baseline, I set cutmix_prob=0.0 not to use cutmix.

| Model & Augmentations | try1 | try2 | try3 | try4 | try5 | Average – | – | – | – | – | – | – | – cutmix p=0.0 | Pyramid200(Converged) | 17.14 | 16.32 | 16.15 | 16.29 | 16.61 | 16.502   | Pyramid200(Best) | 17.01 | 16.02 | 16.01 | 16.17 | 16.35 | 16.312 cutmix p=0.5 | CutMix(Converged) | 16.27 | 15.55 | 16.18 | 16.19 | 15.38 | 15.914   | CutMix(Best) | 15.29 | 14.66 | 15.28 | 15.04 | 14.52 | 14.958

The baseline has a similar top-1 accuracy as your paper said (16.45), but with cutmix(p=0.5), the result is somewhat poor compared to the reported value(14.23).

Also, I conducted an experiments with shakedrop (after codes for shakedrop regularization has been brought from ‘https://github.com/owruby/shake-drop_pytorch’).

|   | try1 | try2 | try3 | try4 | try5 | Average – | – | – | – | – | – | – | – cutmix p=0.5 | ShakeDrop+CutMix(Converged) | 14.06 | 14 | 14.16 | 13.86 | 14 | 14.016   | ShakeDrop+CutMix(Best) | 13.67 | 13.81 | 13.8 | 13.69 | 13.62 | 13.718

Here you can see, top-1 accuracy you claimed on the paper can be achieved only by using ‘maximum top-1 validation accuracy’ during training, not by using ‘converged top-1 validation accuracy’ after training.

So, here is my questions.

  1. How can I reproduce your result? Especially with your provided codes and sample commands, I should reproduce 14.23% of Top1 Accuracy with PyramidNet+Cutmix. It will be great if you can provide the specific environment and command to reproduce the result or this helps you to find some problems on this repo.

  2. Did you use ‘last validation accuracy’ after training or ‘best validation accuracy(peak accuracy)’ while training? I saw some codes tracking the best validation accuracy while training and print out the value before terminating, so I assume that you used ‘best(peak) validation accuracy’.

Thanks. I look forward to hearing from you.

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:9

github_iconTop GitHub Comments

3reactions
hellbellcommented, Aug 13, 2019

@ildoonet Thank you for your reply. I do understand your concerns but I don’t agree that mentioning the best performance is cheating. As I said, the best model surely can be treated to represent the performance of the method. The difference between the best and the last model is coming from the step decaying learning rate. In our case of using cosine learning rate on CIFAR100, the best and last model is almost the same (within ± 0.1% acc). All the experiments we re-implemented are conducted in the same experiment setting, the best model is selected for every other method, so there is no cheating and fair-comparison issues. Our best model’s performance is not instantly peaked high value because we conducted several times and report the mean of the best performances.

1reaction
JiyueWangcommented, Nov 10, 2020

‘Cheating’ is a rather harsh word. However, comparing the peak value indeed benefits oscillating and risky methods.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Replication crisis - Wikipedia
The replication crisis is an ongoing methodological crisis in which the results of many scientific studies are difficult or impossible to reproduce.
Read more >
1500 scientists lift the lid on reproducibility - Nature
The survey asked scientists what led to problems in reproducibility. More than 60% of respondents said that each of two factors — pressure...
Read more >
Reproducibility of Scientific Results
The regress poses a problem about how to choose between these interpretations, a problem which threatens the epistemic value of replication ...
Read more >
Is science really facing a reproducibility crisis, and do we need ...
Recent evidence from metaresearch studies suggests that issues with research integrity and reproducibility, while certainly important ...
Read more >
What is the Replication Crisis? - News Medical
The replication crisis, also known as the reproducibility crisis and the replicability crisis, is a crisis that impacts the methodology of ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found