Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Ms_marco document / Bi-encoder evaluate

See original GitHub issue

Hi! Thanks for your work in this repo! I was trying to reproduce bi/cross encoder in ms marco dataset. However, there are some questions that confuse me：

I saw that you mentioned in the description that the MRR@10 of the passage retrieval task of the msmarco dataset is around 30-36, but when I use distillroberta-base for training, I can easily reach 37-39, even if the negative of each query is 1000. So where is the problem? Or should I use eval_msmarco.py from the example?
Have you experimented with bi/cross encoder in document-level tasks? If so, can you share the results of the experiment? I want to verify that my changes are correct. In addition, do you have a recommended method for document-level tasks?
When I tried to train the bi-encoder, I observed that the provided example MRR can easily reach 42+ or even 70+ (passage retrieval), but when I changed to document retrieval, it quickly reached 20+, and then MRR was It oscillates around 20-22, so are these two results correct? If I want to evaluate the performance of bi-encoder separately, what should I do?

All the code I use comes from sentence-transformers/examples/training/ms_marco

Issue Analytics

State:
Created 2 years ago
Comments:6 (3 by maintainers)

Top GitHub Comments

1reaction

nreimerscommented, May 17, 2021

Have a look here https://www.sbert.net/docs/pretrained-models/msmarco-v3.html

Will soon add the v3 training scripts https://www.sbert.net/examples/training/ms_marco/README.html#bi-encoder

1reaction

nreimerscommented, May 14, 2021

Hi,

Yes, you should eval_msmarco.py. During training, scores are computed on a small subset with a smaller corpus (so that the scores can be computed more quickly). For comparison, you must run it on all the 8 million passages.
I have not done experiments yet. What works well is to split your documents into paragraphs, either by e.g. identifying paragraphs (like 1 / 2 blank lines) or by splitting your doc into e.g. 100 word chunks. Then encode them individually and do standard passage retrieval. Later you can map it back to the document, i.e., from which doc is this passage => show the doc to the user
I have no experiences with doc retrieval. For MRR of 70+: Yes, this is normal as during training a tiny corpus is used for quicker evaluation.

Top Results From Across the Web

MS MARCO — Sentence-Transformers documentation

MS MARCO Passage Ranking is a large dataset to train models for information retrieval. It consists of about 500k real search queries from...

MS MARCO - Microsoft Open Source

date type MRR@100 (Dev) MRR@100 (Eval) 2022/02/08 🏆 full ranking 0.512 0.446 2021/07/14 🏆 full ranking 0.500 0.440 2021/06/24 🏆 full ranking 0.496 0.436

Multilingual Information Retrieval (MS-Marco Bi-Encoders) #695

My Question / Kind Request We currently use your new Bi-Encoder ... Is the MS MARCO dataset useful for training and evaluation: Yes, ......

cross-encoder/msmarco-MiniLM-L6-en-de-v1 - Hugging Face

This is a cross-lingual Cross-Encoder model for EN-DE that can be used for passage re-ranking. It was trained on the MS Marco Passage...

Semi-Siamese Bi-encoder Neural Ranking Model Using ...

Between the two, bi-encoder is highly efficient because all the documents can be ... metrics are evaluated over Robust04, ClueWeb09b, and MS-MARCO datasets....