Search Score Decreases with More Accurate Query
See original GitHub issueThe following are steps to reproduce an issue I am experiencing. First, create an index and add a two-word String:
var index = lunr(function () {
this.ref('ref')
this.field('text')
})
index.add({
ref: 1,
text: 'yes funny'
})
This first query is a portion of the first word followed by the complete second word:
console.log(index.search("ye funny"))
// => [ { ref: '1', score: 0.773957299203321 } ]
This second query also begins with a portion of the first word (can be identical to the term in the previous query, or not) followed by only a portion of the second word:
console.log(index.search("ye fun"))
// => [ { ref: '1', score: 1 } ]
Issue: Why does the second, less accurate query return a higher score than the first, more accurate query?
Note: For this particular example, the issue occurs only when the stemmer is disabled. See my comment below for better examples.
Thank you very much.
Issue Analytics
- State:
- Created 10 years ago
- Reactions:1
- Comments:8 (4 by maintainers)
Top Results From Across the Web
Improving search relevance with boolean queries - Elastic
The score represents how relevant a given document is for a specific query. The default scoring algorithm used by Elasticsearch is BM25.
Read more >Assign a higher score to matches containing the search query ...
When I search for "Ann", I would like elastic to return all 3 of these documents (because they all match the term "Ann"...
Read more >Chapter 4. Query Performance Optimization - O'Reilly
This query returns only 200 rows, but it needs to read thousands of rows to build the result set. An index can't reduce...
Read more >Relevance Scores: Understanding and Customizing
The relevance of a returned search item is determined based on its score compared with other scores in the result set, where items...
Read more >Answering the Min-Cost Quality-Aware Query on Multi ... - NCBI
Accurate search sacrifices quality to reduce costs while ensuring that the quality threshold is satisfied, so sometimes the resulting quality ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Thanks again for your investigation into this issue.
I’ve taken a look at the photo example you posted, this again looks like an issue with the automatic wild card that is used when you do a search currently.
You can see this for yourself at
lib/index.js:301
where it expands the query term. In the example index'photo'
expands to['photo', 'photograph']
.So you get the following vectors:
And so the queryVector does not exactly match the photo vector, hence the score less than 1. When doing a search for
photograph
this doesn’t happen becausephotograph
doesn’t expand to anything, so you get the following vectors:Hence you get a score of 1 for the
photograph
search.Without the automatic wildcarding
photo
would not be expanded and then you would get the result you expect.I’m not sure the best way to progress this issue, I have opened a separate issue #37 to discuss a feature to add lower level query interface that would not have automatic wildcarding. I still think that in general use having the automatic wildcard at the end of each query term is useful, but perhaps there could be a way to disable this? E.g.
idx.search('photo', false)
oridx.search('photo', { autoWildcard: false })
.I am experiencing a very similar, yet new issue in v0.4 (the issue above persists). The new issue is demonstrated here: http://jsfiddle.net/7eGpQ
As shown, my index has the documents
'photo'
and'photograph'
. When searching with the queries'p'
,'ph'
,'pho'
,'phot'
,'photo'
, and'photogr'
, I receive inconsistent and unexpected scores (which I describe in my comments in the Fiddle). Finally, when searching for'photo'
, not only do I not receive a perfect score, I receive a score lower than all previous queries.Thank you very much for your excellent work. Lunr is fantastic.