question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Search Score Decreases with More Accurate Query

See original GitHub issue

The following are steps to reproduce an issue I am experiencing. First, create an index and add a two-word String:

var index = lunr(function () {
  this.ref('ref')
  this.field('text')
})
index.add({
  ref: 1,
  text: 'yes funny'
})

This first query is a portion of the first word followed by the complete second word:

console.log(index.search("ye funny"))
// => [ { ref: '1', score: 0.773957299203321 } ]

This second query also begins with a portion of the first word (can be identical to the term in the previous query, or not) followed by only a portion of the second word:

console.log(index.search("ye fun"))
// => [ { ref: '1', score: 1 } ]

Issue: Why does the second, less accurate query return a higher score than the first, more accurate query?

Note: For this particular example, the issue occurs only when the stemmer is disabled. See my comment below for better examples.

Thank you very much.

Issue Analytics

  • State:open
  • Created 10 years ago
  • Reactions:1
  • Comments:8 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
olivernncommented, Jun 17, 2013

Thanks again for your investigation into this issue.

I’ve taken a look at the photo example you posted, this again looks like an issue with the automatic wild card that is used when you do a search currently.

You can see this for yourself at lib/index.js:301 where it expands the query term. In the example index 'photo' expands to ['photo', 'photograph'].

So you get the following vectors:

var queryVector      = [1.6931471805599454, 1.052011492633005],
    photoVector      = [1.6931471805599454, 0],
    photographVector = [0, 1.6931471805599454]

And so the queryVector does not exactly match the photo vector, hence the score less than 1. When doing a search for photograph this doesn’t happen because photograph doesn’t expand to anything, so you get the following vectors:

var queryVector      = [0, 1.6931471805599454],
    photoVector      = [1.6931471805599454, 0],
    photographVector = [0, 1.6931471805599454]

Hence you get a score of 1 for the photograph search.

Without the automatic wildcarding photo would not be expanded and then you would get the result you expect.

I’m not sure the best way to progress this issue, I have opened a separate issue #37 to discuss a feature to add lower level query interface that would not have automatic wildcarding. I still think that in general use having the automatic wildcard at the end of each query term is useful, but perhaps there could be a way to disable this? E.g. idx.search('photo', false) or idx.search('photo', { autoWildcard: false }).

1reaction
DannyNemercommented, Oct 20, 2013

I am experiencing a very similar, yet new issue in v0.4 (the issue above persists). The new issue is demonstrated here: http://jsfiddle.net/7eGpQ

As shown, my index has the documents 'photo' and 'photograph'. When searching with the queries 'p', 'ph', 'pho', 'phot', 'photo', and 'photogr', I receive inconsistent and unexpected scores (which I describe in my comments in the Fiddle). Finally, when searching for 'photo', not only do I not receive a perfect score, I receive a score lower than all previous queries.

Thank you very much for your excellent work. Lunr is fantastic.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Improving search relevance with boolean queries - Elastic
The score represents how relevant a given document is for a specific query. The default scoring algorithm used by Elasticsearch is BM25.
Read more >
Assign a higher score to matches containing the search query ...
When I search for "Ann", I would like elastic to return all 3 of these documents (because they all match the term "Ann"...
Read more >
Chapter 4. Query Performance Optimization - O'Reilly
This query returns only 200 rows, but it needs to read thousands of rows to build the result set. An index can't reduce...
Read more >
Relevance Scores: Understanding and Customizing
The relevance of a returned search item is determined based on its score compared with other scores in the result set, where items...
Read more >
Answering the Min-Cost Quality-Aware Query on Multi ... - NCBI
Accurate search sacrifices quality to reduce costs while ensuring that the quality threshold is satisfied, so sometimes the resulting quality  ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found