Reproduction of experimental results
See original GitHub issueFirst of all, thanks for sharing this cleaned and object-oriented code! I have learned a lot from this repo. I even want to say Wow, you can really code!
^_^
I have training the model on CoNLL04 dataset with the default configuration, according to the README, and the test results as follows:
--- Entities (NER) ---
type precision recall f1-score support
Org 79.43 83.84 81.57 198
Loc 91.51 90.87 91.19 427
Other 76.61 71.43 73.93 133
Peop 92.17 95.33 93.72 321
micro 87.70 88.51 88.10 1079
macro 84.93 85.37 85.10 1079
--- Relations ---
Without NER
type precision recall f1-score support
Kill 84.78 82.98 83.87 47
OrgBI 73.86 61.90 67.36 105
Work 61.54 63.16 62.34 76
LocIn 74.36 61.70 67.44 94
Live 74.04 77.00 75.49 100
micro 72.84 68.01 70.34 422
macro 73.72 69.35 71.30 422
With NER
type precision recall f1-score support
Kill 84.78 82.98 83.87 47
OrgBI 73.86 61.90 67.36 105
Work 61.54 63.16 62.34 76
LocIn 73.08 60.64 66.28 94
Live 74.04 77.00 75.49 100
micro 72.59 67.77 70.10 422
macro 73.46 69.14 71.07 422
The test result is worse than the original paper, especially for macro-average
metrics.
Is it possible that the random seed
is different? I just set seed=42
in example_train.conf
Thanks!
Issue Analytics
- State:
- Created 4 years ago
- Comments:7 (3 by maintainers)
Top Results From Across the Web
Six factors affecting reproducibility in life science research ...
In theory, researchers should be able to re-create experiments, generate the same results, and arrive at the same conclusions, thus helping to validate...
Read more >Understanding Reproducibility and Replicability - NCBI - NIH
When a new study is conducted and new data are collected, aimed at the same or a similar scientific question as a previous...
Read more >Having hard times reproducing your experiments?
Alltough the results are striking, less than 31% of those surveyed think that failure to reproduce published results means that the result is...
Read more >Reproducibility of Scientific Results
In those disciplines, replication describes the redoing of whole experiments (Barba 2017, Other Internet Resources). In psychology and other ...
Read more >Why can't we reproduce so many scientific findings?
When they could replicate the experiments, the researchers found that the results were less impressive than the original findings; average ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Yes, you should evaluate the provided model on the test set. However, the provided model is the best out of 5 runs, whereas we report the average of 5 runs in our paper (…and due to random weight initialization and sampling the performance varies between runs). That’s why you get a better performance compared to the results we reported in our paper.
Thanks 😃!
I understand. Thanks a lot.