Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Search Score Decreases with More Accurate Query

See original GitHub issue

The following are steps to reproduce an issue I am experiencing. First, create an index and add a two-word String:

var index = lunr(function () {
  this.ref('ref')
  this.field('text')
})
index.add({
  ref: 1,
  text: 'yes funny'
})

This first query is a portion of the first word followed by the complete second word:

console.log(index.search("ye funny"))
// => [ { ref: '1', score: 0.773957299203321 } ]

This second query also begins with a portion of the first word (can be identical to the term in the previous query, or not) followed by only a portion of the second word:

console.log(index.search("ye fun"))
// => [ { ref: '1', score: 1 } ]

Issue: Why does the second, less accurate query return a higher score than the first, more accurate query?

Note: For this particular example, the issue occurs only when the stemmer is disabled. See my comment below for better examples.

Thank you very much.

Issue Analytics

State:
Created 10 years ago
Reactions:1
Comments:8 (4 by maintainers)

Top GitHub Comments

1reaction

olivernncommented, Jun 17, 2013

Thanks again for your investigation into this issue.

I’ve taken a look at the photo example you posted, this again looks like an issue with the automatic wild card that is used when you do a search currently.

You can see this for yourself at lib/index.js:301 where it expands the query term. In the example index 'photo' expands to ['photo', 'photograph'].

So you get the following vectors:

var queryVector      = [1.6931471805599454, 1.052011492633005],
    photoVector      = [1.6931471805599454, 0],
    photographVector = [0, 1.6931471805599454]

And so the queryVector does not exactly match the photo vector, hence the score less than 1. When doing a search for photograph this doesn’t happen because photograph doesn’t expand to anything, so you get the following vectors:

var queryVector      = [0, 1.6931471805599454],
    photoVector      = [1.6931471805599454, 0],
    photographVector = [0, 1.6931471805599454]

Hence you get a score of 1 for the photograph search.

Without the automatic wildcarding photo would not be expanded and then you would get the result you expect.

I’m not sure the best way to progress this issue, I have opened a separate issue #37 to discuss a feature to add lower level query interface that would not have automatic wildcarding. I still think that in general use having the automatic wildcard at the end of each query term is useful, but perhaps there could be a way to disable this? E.g. idx.search('photo', false) or idx.search('photo', { autoWildcard: false }).

1reaction

DannyNemercommented, Oct 20, 2013

I am experiencing a very similar, yet new issue in v0.4 (the issue above persists). The new issue is demonstrated here: http://jsfiddle.net/7eGpQ

As shown, my index has the documents 'photo' and 'photograph'. When searching with the queries 'p', 'ph', 'pho', 'phot', 'photo', and 'photogr', I receive inconsistent and unexpected scores (which I describe in my comments in the Fiddle). Finally, when searching for 'photo', not only do I not receive a perfect score, I receive a score lower than all previous queries.

Thank you very much for your excellent work. Lunr is fantastic.