question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Reproductibility issue

See original GitHub issue

I’m having trouble reproducing the results on CNN/DM dataset.

I downloaded the data and the fine-tuned model provided in the README, and I followed the commands to predict the test set.

Everything is running fine, but at the end I have the following results :

1 ROUGE-1 Average_R: 0.62689 (95%-conf.int. 0.62269 - 0.63111) 1 ROUGE-1 Average_P: 0.13695 (95%-conf.int. 0.13561 - 0.13828) 1 ROUGE-1 Average_F: 0.22101 (95%-conf.int. 0.21918 - 0.22288)

1 ROUGE-2 Average_R: 0.33142 (95%-conf.int. 0.32673 - 0.33603) 1 ROUGE-2 Average_P: 0.06949 (95%-conf.int. 0.06832 - 0.07078) 1 ROUGE-2 Average_F: 0.11266 (95%-conf.int. 0.11089 - 0.11456)

1 ROUGE-L Average_R: 0.52624 (95%-conf.int. 0.52179 - 0.53061) 1 ROUGE-L Average_P: 0.11465 (95%-conf.int. 0.11345 - 0.11598) 1 ROUGE-L Average_F: 0.18509 (95%-conf.int. 0.18333 - 0.18698)

/root/code/unilm/src/cnndm_model/cnndm_model.bin.test.alp1.0 ROUGE-F(1/2/l): 22.10/11.27/18.51 ROUGE-R(1/2/3/l): 62.69/33.14/52.62


It’s weird because I checked the prediction file (cnndm_model.bin.test.alp1.0.post) and compared it with the one provided in the README, and most of the time there is only a few differences.

Here is a comparison of the last few lines of the file (left is the ‘official’ one, right is mine)

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:5 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
donglixpcommented, Oct 15, 2019

Thanks for spotting the incorrect script path.

-li

0reactions
astariulcommented, Oct 15, 2019

8c1f46d4e4ab7993665ac2a76406855c471a15df fixed my problem : I was using wrong script.

Using cnndm/eval.py instead of gigaword/eval.py fixed it.

I could reproduce your result, using official ROUGE script.

Thanks a lot of the help !! 👍

Read more comments on GitHub >

github_iconTop Results From Across the Web

Replication crisis - Wikipedia
The replication crisis is an ongoing methodological crisis in which the results of many scientific studies are difficult or impossible to reproduce.
Read more >
1500 scientists lift the lid on reproducibility - Nature
The survey asked scientists what led to problems in reproducibility. More than 60% of respondents said that each of two factors — pressure...
Read more >
Reproducibility of Scientific Results
The regress poses a problem about how to choose between these interpretations, a problem which threatens the epistemic value of replication ...
Read more >
No raw data, no science: another possible source of the ...
The reproducibility or replicability crisis is a serious issue in which many scientific studies are difficult to reproduce or replicate. It is ...
Read more >
Is science really facing a reproducibility crisis, and do we need ...
Recent evidence from metaresearch studies suggests that issues with research integrity and reproducibility, while certainly important ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found