Ms_marco document / Bi-encoder evaluate
See original GitHub issueHi! Thanks for your work in this repo! I was trying to reproduce bi/cross encoder in ms marco dataset. However, there are some questions that confuse me:
- I saw that you mentioned in the description that the MRR@10 of the passage retrieval task of the msmarco dataset is around 30-36, but when I use distillroberta-base for training, I can easily reach 37-39, even if the negative of each query is 1000. So where is the problem? Or should I use eval_msmarco.py from the example?
- Have you experimented with bi/cross encoder in document-level tasks? If so, can you share the results of the experiment? I want to verify that my changes are correct. In addition, do you have a recommended method for document-level tasks?
- When I tried to train the bi-encoder, I observed that the provided example MRR can easily reach 42+ or even 70+ (passage retrieval), but when I changed to document retrieval, it quickly reached 20+, and then MRR was It oscillates around 20-22, so are these two results correct? If I want to evaluate the performance of bi-encoder separately, what should I do?
All the code I use comes from sentence-transformers/examples/training/ms_marco
Issue Analytics
- State:
- Created 2 years ago
- Comments:6 (3 by maintainers)
Top Results From Across the Web
MS MARCO — Sentence-Transformers documentation
MS MARCO Passage Ranking is a large dataset to train models for information retrieval. It consists of about 500k real search queries from...
Read more >MS MARCO - Microsoft Open Source
date type MRR@100 (Dev) MRR@100 (Eval)
2022/02/08 🏆 full ranking 0.512 0.446
2021/07/14 🏆 full ranking 0.500 0.440
2021/06/24 🏆 full ranking 0.496 0.436
Read more >Multilingual Information Retrieval (MS-Marco Bi-Encoders) #695
My Question / Kind Request We currently use your new Bi-Encoder ... Is the MS MARCO dataset useful for training and evaluation: Yes, ......
Read more >cross-encoder/msmarco-MiniLM-L6-en-de-v1 - Hugging Face
This is a cross-lingual Cross-Encoder model for EN-DE that can be used for passage re-ranking. It was trained on the MS Marco Passage...
Read more >Semi-Siamese Bi-encoder Neural Ranking Model Using ...
Between the two, bi-encoder is highly efficient because all the documents can be ... metrics are evaluated over Robust04, ClueWeb09b, and MS-MARCO datasets....
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Have a look here https://www.sbert.net/docs/pretrained-models/msmarco-v3.html
Will soon add the v3 training scripts https://www.sbert.net/examples/training/ms_marco/README.html#bi-encoder
Hi,
Yes, you should eval_msmarco.py. During training, scores are computed on a small subset with a smaller corpus (so that the scores can be computed more quickly). For comparison, you must run it on all the 8 million passages.
I have not done experiments yet. What works well is to split your documents into paragraphs, either by e.g. identifying paragraphs (like 1 / 2 blank lines) or by splitting your doc into e.g. 100 word chunks. Then encode them individually and do standard passage retrieval. Later you can map it back to the document, i.e., from which doc is this passage => show the doc to the user
I have no experiences with doc retrieval. For MRR of 70+: Yes, this is normal as during training a tiny corpus is used for quicker evaluation.