Dev Observability
Product
Pricing
Docs
Resources
Blog
Company
Debug Wordle

question-mark

Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Reproductibility issue

See original GitHub issue

I’m having trouble reproducing the results on CNN/DM dataset.

I downloaded the data and the fine-tuned model provided in the README, and I followed the commands to predict the test set.

Everything is running fine, but at the end I have the following results :

1 ROUGE-1 Average_R: 0.62689 (95%-conf.int. 0.62269 - 0.63111) 1 ROUGE-1 Average_P: 0.13695 (95%-conf.int. 0.13561 - 0.13828) 1 ROUGE-1 Average_F: 0.22101 (95%-conf.int. 0.21918 - 0.22288)

1 ROUGE-2 Average_R: 0.33142 (95%-conf.int. 0.32673 - 0.33603) 1 ROUGE-2 Average_P: 0.06949 (95%-conf.int. 0.06832 - 0.07078) 1 ROUGE-2 Average_F: 0.11266 (95%-conf.int. 0.11089 - 0.11456)

1 ROUGE-L Average_R: 0.52624 (95%-conf.int. 0.52179 - 0.53061) 1 ROUGE-L Average_P: 0.11465 (95%-conf.int. 0.11345 - 0.11598) 1 ROUGE-L Average_F: 0.18509 (95%-conf.int. 0.18333 - 0.18698)

/root/code/unilm/src/cnndm_model/cnndm_model.bin.test.alp1.0 ROUGE-F(1/2/l): 22.10/11.27/18.51 ROUGE-R(1/2/3/l): 62.69/33.14/52.62

It’s weird because I checked the prediction file (cnndm_model.bin.test.alp1.0.post) and compared it with the one provided in the README, and most of the time there is only a few differences.

Here is a comparison of the last few lines of the file (left is the ‘official’ one, right is mine)

Issue Analytics

State:
Created 4 years ago
Comments:5 (2 by maintainers)

Top GitHub Comments

1reaction

donglixpcommented, Oct 15, 2019

Thanks for spotting the incorrect script path.

-li

0reactions

astariulcommented, Oct 15, 2019

8c1f46d4e4ab7993665ac2a76406855c471a15df fixed my problem : I was using wrong script.

Using cnndm/eval.py instead of gigaword/eval.py fixed it.

I could reproduce your result, using official ROUGE script.

Thanks a lot of the help !! 👍

Read more comments on GitHub >

Top Results From Across the Web

Replication crisis - Wikipedia

The replication crisis is an ongoing methodological crisis in which the results of many scientific studies are difficult or impossible to reproduce.

1500 scientists lift the lid on reproducibility - Nature

The survey asked scientists what led to problems in reproducibility. More than 60% of respondents said that each of two factors — pressure...

Reproducibility of Scientific Results

The regress poses a problem about how to choose between these interpretations, a problem which threatens the epistemic value of replication ...

No raw data, no science: another possible source of the ...

The reproducibility or replicability crisis is a serious issue in which many scientific studies are difficult to reproduce or replicate. It is ...

Is science really facing a reproducibility crisis, and do we need ...

Recent evidence from metaresearch studies suggests that issues with research integrity and reproducibility, while certainly important ...

Top Related Medium Post

No results found

Top Related StackOverflow Question

No results found

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Top Related Reddit Thread

No results found

Top Related Hackernoon Post

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

Top Related Hashnode Post

No results found

CNN/DM : data preprocessing

Async Completion API discussion