reproducing the paper's best results
See original GitHub issueI’ve tried to replicate the paper. For bert-base-nli-mean-tokens
, the model which was trained from scratch with your code reached 74.71 of cosine-similarity on the sts-test set. It is way too low compared to the score on the paper. Any thoughts?
Issue Analytics
- State:
- Created 4 years ago
- Comments:8 (4 by maintainers)
Top Results From Across the Web
Reproducibility - Wikipedia
This article is about the reproducibility of scientific research results. For reproductive capacity of organisms, see fertility and fecundity.
Read more >Images, Charts, Graphs, Maps & Tables - APA Citation Guide ...
Reproducing happens when you copy or recreate an image, table, graph or chart that is not your original creation. If you reproduce one...
Read more >Tips to Improve Your Research Paper: The Results Section
The most common way to organize results is from most to least important. Using this method has a couple of advantages. You can...
Read more >Science has been in a “replication crisis” for a decade ... - Vox
But when the attempted replication finds different or no results, that often suggests that the original research finding was spurious.
Read more >11 steps to structuring a science paper editors will take seriously
Prepare the figures and tables. · Write the Methods. · Write up the Results. · Write the Discussion. Finalize the Results and Discussion...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Hi, the problem occurred when I updated huggingface pytorch-transformers to version 1.x (which was with version 2.0 of sentence-transformers): Performances magically dropped, even though the setup was as before.
I performed extensive debugging, copying old code from huggingface, but sadly never found a way to fix it. Interesting, when loading the weights trained with the old huggingface code, the same performances were still achieved. So something must have changed in the training procedure of huggingface code that leads to this inferior performance with version 1 of pytorch-transformers. Maybe the optimizer code is a bit different?
I was not the only person affected by this, but several people mentioned this in the huggingface repo (see https://github.com/huggingface/transformers/issues/938), that they now achieve slightly worse performances. The reason is unclear.
I will soon be able to update pytorch-transformers to version 2. Maybe the issue is resolved in that version? Who knows.
If you like to reproduce the old sts experiment scores, I recommend to use the older versions of this repository, one that uses pytorch-transformers 0.x version.
Best regards Nils Reimers
Hi @K-Mike I used in the paper bert-as-a-service with mean-pooling. Here is the code I used:
As I learned later (pointed out in one of the issues here): Bert-as-a-service interprets mean-pooling a bit differently.
This might be the cause of the differences? Maybe taking only the last layer and perform mean pooling might be better than the REDUCE_MEAN pooling from bert-as-a-service? Would be interesting to see which is better.
Best Nils Reimers