question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Reproduction of experimental results

See original GitHub issue

First of all, thanks for sharing this cleaned and object-oriented code! I have learned a lot from this repo. I even want to say Wow, you can really code! ^_^

I have training the model on CoNLL04 dataset with the default configuration, according to the README, and the test results as follows:

--- Entities (NER) ---

                type    precision       recall     f1-score      support
                 Org        79.43        83.84        81.57          198
                 Loc        91.51        90.87        91.19          427
               Other        76.61        71.43        73.93          133
                Peop        92.17        95.33        93.72          321

               micro        87.70        88.51        88.10         1079
               macro        84.93        85.37        85.10         1079

--- Relations ---

Without NER
                type    precision       recall     f1-score      support
                Kill        84.78        82.98        83.87           47
               OrgBI        73.86        61.90        67.36          105
                Work        61.54        63.16        62.34           76
               LocIn        74.36        61.70        67.44           94
                Live        74.04        77.00        75.49          100

               micro        72.84        68.01        70.34          422
               macro        73.72        69.35        71.30          422

With NER
                type    precision       recall     f1-score      support
                Kill        84.78        82.98        83.87           47
               OrgBI        73.86        61.90        67.36          105
                Work        61.54        63.16        62.34           76
               LocIn        73.08        60.64        66.28           94
                Live        74.04        77.00        75.49          100

               micro        72.59        67.77        70.10          422
               macro        73.46        69.14        71.07          422

The test result is worse than the original paper, especially for macro-average metrics.

Is it possible that the random seed is different? I just set seed=42 in example_train.conf

Thanks!

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:7 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
markus-ebertscommented, Jul 15, 2020

Yes, you should evaluate the provided model on the test set. However, the provided model is the best out of 5 runs, whereas we report the average of 5 runs in our paper (…and due to random weight initialization and sampling the performance varies between runs). That’s why you get a better performance compared to the results we reported in our paper.

Thanks 😃!

0reactions
JackySnakecommented, Jul 15, 2020

Yes, you should evaluate the provided model on the test set. However, the provided model is the best out of 5 runs, whereas we report the average of 5 runs in our paper (…and due to random weight initialization and sampling the performance varies between runs). That’s why you get a better performance compared to the results we reported in our paper.

Thanks 😃!

I understand. Thanks a lot.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Six factors affecting reproducibility in life science research ...
In theory, researchers should be able to re-create experiments, generate the same results, and arrive at the same conclusions, thus helping to validate...
Read more >
Understanding Reproducibility and Replicability - NCBI - NIH
When a new study is conducted and new data are collected, aimed at the same or a similar scientific question as a previous...
Read more >
Having hard times reproducing your experiments?
Alltough the results are striking, less than 31% of those surveyed think that failure to reproduce published results means that the result is...
Read more >
Reproducibility of Scientific Results
In those disciplines, replication describes the redoing of whole experiments (Barba 2017, Other Internet Resources). In psychology and other ...
Read more >
Why can't we reproduce so many scientific findings?
When they could replicate the experiments, the researchers found that the results were less impressive than the original findings; average ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found