Support BM25 parameters customization
See original GitHub issueWould you consider to support customization of BM25 parameters? It would be very helpful for optimizing search relevance.
var k = 1.2; // Term frequency saturation point. Recommended values are between 1.2 and 2.
var b = 1.2; // Length normalization impact. Recommended values are around 0.75.
var d = 0.5; // BM25+ frequency normalization lower bound. Recommended values are between 0.5 and 1.
Issue Analytics
- State:
- Created a year ago
- Comments:6 (4 by maintainers)
Top Results From Across the Web
Practical BM25 - Part 3: Considerations for Picking b and k1 in ...
Learn about best practices and other considerations before modifying the b and k1 values of the BM25 similarity ranking (relevancy) ...
Read more >How to choose the OKAPI BM25 parameters : b and k1
A simple way of tuning the parameters is to adjust them and then evaluate their performance impact. If the results are not satisfying,...
Read more >BM25 Reference - Vespa Documentation
: A parameter used to limit how much a single query term can affect the score for document D. With a higher value...
Read more >New BM25 functions and IDF operators in custom rankers
Please note that all 3 provide float values between 0..1 and they work only inside sum() just as tf_idf works. mysql> ...
Read more >Configure relevance scoring - Azure Cognitive Search
Set BM25 parameters · Formulate a Create or Update Index request as illustrated by the following example. HTTP Copy · Set "b" and...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Here are some of my notes that may help in documenting the parameters, if they’re exposed. May need a bit of a rewrite 😃
This article is also helpful for understanding
k
andb
: https://opensourceconnections.com/blog/2015/10/16/bm25-the-next-generation-of-lucene-relevation/k
is the BM25 term frequency saturation point.1.2
.1.2
and2
.0
or a negative number is invalid (could be validated automatically?).b
is the BM25 length normalization impact.0
disables the field length having an effect on scoring altogether (not recommended).0.7
.0.75
.d
(actually δ) is the BM25+ frequency normalization lower bound.0.5
.0.5
and1.0
.0
disables this feature (not recommended).@lucaong Thank you for planning this request. While working on the search function on my dataset, it is very flexible for me the add language-specific tokenizer. I don’t have other recommendation at this point since it is already met what I need.