Add common metrics for information retrieval
See original GitHub issueThere’s a nice summary of common metrics for IR here: https://www.pinecone.io/learn/offline-evaluation/
Although we have many of these as part of trec_eval
, it could make sense to have separate metrics like MRR and NDCG@K to give visibility
Issue Analytics
- State:
- Created a year ago
- Comments:5 (3 by maintainers)
Top Results From Across the Web
Evaluation Metrics For Information Retrieval - Amit Chaudhary
Learn about common metrics used to evaluate performance of information retrieval systems.
Read more >Evaluation Measures in Information Retrieval - Pinecone
How to measure retrieval performance with offline metrics like recall@K, MRR, MAP@K, and NDCG@K.
Read more >Evaluation measures (information retrieval) - Wikipedia
Evaluation measures for an information retrieval (IR) system assess how well an index, ... Offline metricsEdit. Offline metrics are generally created from relevance...
Read more >Evaluation measures (information retrieval) - Wikiwand
Offline metrics are generally created from relevance judgment sessions where the judges score the quality of the search results. Both binary (relevant/non- ...
Read more >Evaluation in information retrieval - Stanford NLP Group
R-precision ad- justs for the size of the set of relevant documents: A perfect system could score 1 on this metric for each...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
ICT - Inverse Close Task, e.g. as defined here https://arxiv.org/pdf/1906.00300.pdf
Yes I think what would be more easy for my use-case would be things like Recall / Precision @ n etc, I can take a stab at those then 😃
Hey @cakiki are you still interested in working on this?
I’m thinking about IR metrics in the context of building the retriever for ROOTS - I want a simple way to test the quality of retrievers I’m building and I’m thinking about the following approach right now:
Then I could be evaluating using
huggingface/evaluate
IR metrics.What do you think? I’m also looking for better ideas for creating test sets for ROOTS retrieval 😃