Ranking results using weighted documents lower than the paper
See original GitHub issueHi, Thanks for sharing the data and source code!
I tried to reproduce the result using the shared Virtual Appendix/weighted_documents.Here are the steps I followed on the anserini expreriments on msmarco.
unzip sqrt_sample_100_jsonl.zip
- build anserini index by
sh ../anserini/target/appassembler/bin/IndexCollection \ -collection JsonCollection \ -input sqrt_sample_100_jsonl \ -index lucene-index.msmarco.deepct \ -generator LuceneDocumentGenerator \ -threads 10 -storePositions \ -storeRawDocs > log.msmarco.deepct
- do search on the index by
../anserini/target/appassembler/bin/SearchMsmarco -hits 1000 -threads 10 \ -index lucene-index.msmarco.deepct -qid_queries msmarco/queries.dev.small.tsv \ -output output/run.dev.small.tsv
- eval the reasult
python ../anserini/src/main/python/msmarco/msmarco_eval.py \ msmarco/qrels.dev.small.tsv output/run.dev.small.tsv
However, I could only get aboult 0.22 on sample_100_jsonl.zip
.
##################### MRR @10: 0.22490426160913188 QueriesRanked: 6980 #####################
The result on sqrt_sample_100_jsonl.zip
is aboult 0.20.
##################### MRR @10: 0.20240204438986623 QueriesRanked: 6980 #####################
But the paper says it can be 0.24 on the dev set. Is there anything wrong with my processing?
Thanks!
Issue Analytics
- State:
- Created 4 years ago
- Comments:5 (2 by maintainers)
Top Results From Across the Web
Ranking Results – How Google Search Works
The weight applied to each factor varies depending on the nature of your query. For example, the freshness of the content plays a...
Read more >Scoring, Term Weighting and the - Information Retrieval
For the query capricious person, idf weighting makes occurrences of capricious count for much more in the final document ranking than occurrences of...
Read more >Using Ranking and Weighting in document search results
Use Ranking /Weighting from the Formatter tool (add to the View column and tick the option in the Sort column) on search results...
Read more >Automatically Combining Ranking Heuristics for HTML ...
In the current paper we apply an automatic method for combining HTML rank- ing heuristics. Using recall/precision evaluations we study.
Read more >Customizing Search Results Ranking - Coveo Platform 7
In the navigation panel on the left, select Ranking Weights. ... Example: If the query term is Coveo, a document with the title...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
It is critical to fine tune the k1 and b parameters of BM25. The optimal k1 should be around 9-13, b is around 0.7-0.9.
On Mon, Dec 23, 2019 at 12:49 AM midori1 notifications@github.com wrote:
Thank you for your kind explaination ! Best wishes
在2020-04-29 23:25:23,Dai Zhuyun (戴竹韵)notifications@github.com写道:
Hi, thanks for checking out my work! You should tune the parameters when running the retrieval, i.e., python anserini/src/main/python/msmarco/retrieve.py -k 8.0 -b 0.9 …
Best, Zhuyun
On Wed, Apr 29, 2020 at 4:03 AM LAW991224 notifications@github.com wrote:
– Zhuyun Dai Language Technologies Institute School of Computer Science 5000 Forbes Avenue Pittsburgh, PA 15213
— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.