Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Ranking results using weighted documents lower than the paper

See original GitHub issue

Hi, Thanks for sharing the data and source code!

I tried to reproduce the result using the shared Virtual Appendix/weighted_documents.Here are the steps I followed on the anserini expreriments on msmarco.

unzip sqrt_sample_100_jsonl.zip
build anserini index by sh ../anserini/target/appassembler/bin/IndexCollection \ -collection JsonCollection \ -input sqrt_sample_100_jsonl \ -index lucene-index.msmarco.deepct \ -generator LuceneDocumentGenerator \ -threads 10 -storePositions \ -storeRawDocs > log.msmarco.deepct
do search on the index by ../anserini/target/appassembler/bin/SearchMsmarco -hits 1000 -threads 10 \ -index lucene-index.msmarco.deepct -qid_queries msmarco/queries.dev.small.tsv \ -output output/run.dev.small.tsv
eval the reasult python ../anserini/src/main/python/msmarco/msmarco_eval.py \ msmarco/qrels.dev.small.tsv output/run.dev.small.tsv

However, I could only get aboult 0.22 on sample_100_jsonl.zip.

##################### MRR @10: 0.22490426160913188 QueriesRanked: 6980 #####################

The result on sqrt_sample_100_jsonl.zip is aboult 0.20.

##################### MRR @10: 0.20240204438986623 QueriesRanked: 6980 #####################

But the paper says it can be 0.24 on the dev set. Is there anything wrong with my processing?

Thanks!

Issue Analytics

State:
Created 4 years ago
Comments:5 (2 by maintainers)

Top GitHub Comments

2reactions

AdeDZYcommented, Dec 23, 2019

It is critical to fine tune the k1 and b parameters of BM25. The optimal k1 should be around 9-13, b is around 0.7-0.9.

On Mon, Dec 23, 2019 at 12:49 AM midori1 notifications@github.com wrote:

Hi, Thanks for sharing the data and source code!

I tried to reproduce the result using the shared Virtual Appendix/weighted_documents http://boston.lti.cs.cmu.edu/appendices/arXiv2019-DeepCT-Zhuyun-Dai/weighted_documents/.Here are the steps I followed on the anserini expreriments on msmarco https://github.com/castorini/anserini/blob/master/docs/experiments-msmarco-passage.md .

unzip sqrt_sample_100_jsonl.zip

build anserini index by sh …/anserini/target/appassembler/bin/IndexCollection \ -collection JsonCollection \ -input sqrt_sample_100_jsonl \ -index lucene-index.msmarco.deepct \ -generator LuceneDocumentGenerator \ -threads 10 -storePositions \ -storeRawDocs > log.msmarco.deepct

do search on the index by …/anserini/target/appassembler/bin/SearchMsmarco -hits 1000 -threads 10 \ -index lucene-index.msmarco.deepct -qid_queries msmarco/queries.dev.small.tsv \ -output output/run.dev.small.tsv

eval the reasult python …/anserini/src/main/python/msmarco/msmarco_eval.py
msmarco/qrels.dev.small.tsv output/run.dev.small.tsv

However, I could only get aboult 0.22 on sample_100_jsonl.zip.

##################### MRR @10 https://github.com/10: 0.22490426160913188 QueriesRanked: 6980 #####################

The result on sqrt_sample_100_jsonl.zip is aboult 0.20.

##################### MRR @10 https://github.com/10: 0.20240204438986623 QueriesRanked: 6980 #####################

But the paper says it can be 0.24 on the dev set. Is there anything wrong with my processing?

Thanks!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/AdeDZY/DeepCT/issues/2?email_source=notifications&email_token=ABHQHGAKSFIDXCRJNH3ZHADQ2A7N5A5CNFSM4J6PTBWKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4ICHXDPQ, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABHQHGE5ZRUXSH57TYY3QZDQ2A7N5ANCNFSM4J6PTBWA .

0reactions

LAW991224commented, May 1, 2020

Thank you for your kind explaination ! Best wishes

在2020-04-29 23:25:23，Dai Zhuyun (戴竹韵)notifications@github.com写道：

Hi, thanks for checking out my work! You should tune the parameters when running the retrieval, i.e., python anserini/src/main/python/msmarco/retrieve.py -k 8.0 -b 0.9 …

Best, Zhuyun

On Wed, Apr 29, 2020 at 4:03 AM LAW991224 notifications@github.com wrote:

Hi, Thank you for sharing the data and source code! I wonder is there a way to fine tune k1 and b without rebuilding index (using anserini method)? At the moment, everytime I want to change the values for k1 and b , I have to delete the existing index and build another one (because the values for k1 and b are specified when building index),which is time-costing.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/AdeDZY/DeepCT/issues/2#issuecomment-621051346, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABHQHGFJFM4DMHE6D4GKK7LRO7NMPANCNFSM4J6PTBWA .

– Zhuyun Dai Language Technologies Institute School of Computer Science 5000 Forbes Avenue Pittsburgh, PA 15213

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

Top Results From Across the Web

Ranking Results – How Google Search Works

The weight applied to each factor varies depending on the nature of your query. For example, the freshness of the content plays a...

Scoring, Term Weighting and the - Information Retrieval

For the query capricious person, idf weighting makes occurrences of capricious count for much more in the final document ranking than occurrences of...

Using Ranking and Weighting in document search results

Use Ranking /Weighting from the Formatter tool (add to the View column and tick the option in the Sort column) on search results...

Automatically Combining Ranking Heuristics for HTML ...

In the current paper we apply an automatic method for combining HTML rank- ing heuristics. Using recall/precision evaluations we study.

Customizing Search Results Ranking - Coveo Platform 7

In the navigation panel on the left, select Ranking Weights. ... Example: If the query term is Coveo, a document with the title...