question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Reproducing the paper results

See original GitHub issue

Dear authors,

Thank you for your exciting work and very clean code. I am having trouble reproducing the results mentioned in the paper and would appreciate it if you could help me.

1. Reproducing the UNO results from table 4. I was trying to get the scores on the samples of novel classes from the test split (Table 4 in the paper).

I have executed the commands for CIFAR10, CIFAR80-20, CIFAR50-50, and used Wandb for logging. However, the results on all datasets did not match the ones that I see in the paper. I took the results from incremental/unlabel/test/acc.

. Paper (avg/best) Reproduced (avg/best)
CIFAR10 93.3 / 93.3 90.8 / 90.8
CIFAR80-20 72.7 / 73.1 65.3 / 65.3
CIFAR50-50 50.6 / 50.7 44.9 / 45.7

image

Potential issues:

  • I am not using the exact versions of the packages mentioned in your ReadMe, and for that reason, I have run the CIFAR80-20 experiment twice, manually setting the seed (as in RankStats repo), however, I obtained very similar results. I also would not suspect a ~7% difference on CIFAR80-20 just to to the package version.
  • I may be using the wrong metric from wandb (I have used incremental/unlabel/test/acc). However, if you check my screenshot, for CIFAR80-20 all the other metrics are significantly different anyway (the value close to 72.7/73.1 does not appear anywhere).

2. How exactly the RankStats algorithm was evaluated on CIFAR50-50.

Could you please share if you performed any hyperparameter tuning for the CIFAR50-50 dataset when running the RankStats algorithm on it? I made multiple experiments and my training was very unstable, the algorithm always ends up scoring ~20/17 on known/novel classes.

Thanks a lot for your time.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:7 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
DonkeyShot21commented, Sep 5, 2021

Ok, now I understand. You are right, this is a potential problem. However, the assignments are quite stable (they are computed on the whole validation set) and, as you said, the potential issue never happens in practice. I remember I tried once to remove the unwanted assignments (the ones that contradicted the labeled head), but the results were exactly the same, while the code was more complicated, so I just removed it. Also, if I remember correctly, in Ranking Statistics they use the same evaluation procedure, so I just sticked to that.

1reaction
DonkeyShot21commented, Sep 3, 2021

Happy to help! I added a note in the README that warns about package versions.

Regarding the evaluation, I think the procedure I am following is correct because I am first concatenating the logits (preds_inc) and then I am taking the max of those concatenated logits. By doing this I lose the information about the task. Then in the compute() method of the metric class, I compute the best mapping on all classes (and not separately).

Read more comments on GitHub >

github_iconTop Results From Across the Web

Fail to reproduce results in a published paper. Is this an ...
The first thing you should do is email the authors with a clear, detailed account of what you did in your replication attempt,...
Read more >
Learn To Reproduce Papers: Beginner's Guide
Even if you are a complete beginner in reproducing papers, there is always something that you can do:
Read more >
How to Make Sure Your Research Paper is Reproducible ...
Reproducing the exact results of the paper using the same data and code may seem like a low bar, but it's one that...
Read more >
Why Can't I Reproduce Their Results?
Perhaps your data isn't clean in the first place? Perhaps it has outliers or invalid entries? Perhaps there is a bug in your...
Read more >
What does it mean to reproduce academic research? - Quora
When a paper is published that makes claims of signicance, other researchers often try to duplicate the study, to 1. Validate the claims,...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found