Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Issues with scoring

See original GitHub issue

Hi! First of all, v4 seems to be give slightly better search ranking than v3.

However, there is a crucial issue currently with the scoring of documents in our application for some search terms. I have tried to recreate this with a synthetic example. For that purpose I’ve collected 5 movies about sheep.

const ms = new MiniSearch({
  fields: ['title', 'description'],
  storeFields: ['title']
})

ms.add({
  id: 1,
  title: 'Rams',
  description: 'A feud between two sheep farmers.'
})

ms.add({
  id: 2,
  title: 'Shaun the Sheep',
  description: 'Shaun is a cheeky and mischievous sheep at Mossy Bottom farm who\'s the leader of the flock and always plays slapstick jokes, pranks and causes trouble especially on Farmer X and his grumpy guide dog, Bitzer.'
})

ms.add({
  id: 3,
  title: 'Silence of the Lambs',
  description: 'F.B.I. trainee Clarice Starling (Jodie Foster) works hard to advance her career, while trying to hide or put behind her West Virginia roots, of which if some knew, would automatically classify her as being backward or white trash. After graduation, she aspires to work in the agency\'s Behavioral Science Unit under the leadership of Jack Crawford (Scott Glenn). While she is still a trainee, Crawford asks her to question Dr. Hannibal Lecter (Sir Anthony Hopkins), a psychiatrist imprisoned, thus far, for eight years in maximum security isolation for being a serial killer who cannibalized his victims. Clarice is able to figure out the assignment is to pick Lecter\'s brains to help them solve another serial murder case, that of someone coined by the media as "Buffalo Bill" (Ted Levine), who has so far killed five victims, all located in the eastern U.S., all young women, who are slightly overweight (especially around the hips), all who were drowned in natural bodies of water, and all who were stripped of large swaths of skin. She also figures that Crawford chose her, as a woman, to be able to trigger some emotional response from Lecter. After speaking to Lecter for the first time, she realizes that everything with him will be a psychological game, with her often having to read between the very cryptic lines he provides. She has to decide how much she will play along, as his request in return for talking to him is to expose herself emotionally to him. The case takes a more dire turn when a sixth victim is discovered, this one from who they are able to retrieve a key piece of evidence, if Lecter is being forthright as to its meaning. A potential seventh victim is high profile Catherine Martin (Brooke Smith), the daughter of Senator Ruth Martin (Diane Baker), which places greater scrutiny on the case as they search for a hopefully still alive Catherine. Who may factor into what happens is Dr. Frederick Chilton (Anthony Heald), the warden at the prison, an opportunist who sees the higher profile with Catherine, meaning a higher profile for himself if he can insert himself successfully into the proceedings.'
})

ms.add({
  id: 4,
  title: 'Lamb',
  description: 'Haunted by the indelible mark of loss and silent grief, sad-eyed María and her taciturn husband, Ingvar, seek solace in back-breaking work and the demanding schedule at their sheep farm in the remote, harsh, wind-swept landscapes of mountainous Iceland. Then, with their relationship hanging on by a thread, something unexplainable happens, and just like that, happiness blesses the couple\'s grim household once more. Now, as a painful ending gives birth to a new beginning, Ingvar\'s troubled brother, Pétur, arrives at the farmhouse, threatening María and Ingvar\'s delicate, newfound bliss. But, nature\'s gifts demand sacrifice. How far are ecstatic María and Ingvar willing to go in the name of love?'
})

ms.add({
  id: 5,
  title: 'Ringing Bell',
  description: 'A baby lamb named Chirin is living an idyllic life on a farm with many other sheep. Chirin is very adventurous and tends to get lost, so he wears a bell around his neck so that his mother can always find him. His mother warns Chirin that he must never venture beyond the fence surrounding the farm, because a huge black wolf lives in the mountains and loves to eat sheep. Chirin is too young and naive to take the advice to heart, until one night the wolf enters the barn and is prepared to kill Chirin, but at the last moment the lamb\'s mother throws herself in the way and is killed instead. The wolf leaves, and Chirin is horrified to see his mother\'s body. Unable to understand why his mother was killed, he becomes very angry and swears that he will go into the mountains and kill the wolf.'
})

ms.search('sheep', { boost: { title: 2 } })

The following are the results:

[
  {
    id: 1,
    terms: [ 'sheep' ],
    score: 4.360862545683414,
    match: { sheep: [Array] },
    title: 'Rams'
  },
  {
    id: 2,
    terms: [ 'sheep' ],
    score: 3.163825722967836,
    match: { sheep: [Array] },
    title: 'Shaun the Sheep'
  },
  {
    id: 5,
    terms: [ 'sheep' ],
    score: 0.3964420496075831,
    match: { sheep: [Array] },
    title: 'Ringing Bell'
  },
  {
    id: 4,
    terms: [ 'sheep' ],
    score: 0.26090630615199917,
    match: { sheep: [Array] },
    title: 'Lamb'
  }
]

The issue is the following. I expect, without any doubt, that ‘Shaun the Sheep’ should be the top result. Why?

Because it is the only movie with ‘sheep’ in the title field and in the description field.
The subjective score of ‘sheep’ within a 3 word title is higher than ‘sheep’ in a 6 word description.
The subjective score of ‘sheep’ in 1 title out of 5 movies is much better than 4 descriptions out of 5 movies.
I have even boosted the title by a factor of 2. In our actual application, I don’t really want to boost one field too much, because it can lead to other scoring problems.

So what goes wrong?

Fields with a high variance in length obscure fields with a low variance in length

The issue is that many other movies have very long descriptions, but ‘Rams’ only has a 6-word description. The relative scoring for field length is fieldLength / averageFieldLength. This heavily disadvantages the description of ‘Shaun the Sheep’, which is only of “average” length. This essentially means that if there is a high variance in a field’s length, the documents with a short field get a very large boost. Regardless of matches in other fields!

A match in two distinct fields in the same document has no bonus

I would expect that ‘Shaun the Sheep’ is a great match for the query ‘sheep’ because it is the only document that has a match in both fields. I think it would be good to give a boost in those cases, similarly to how a document that matches two words in an OR query receives a boost.

So what are the options?

I think we could take a cue from Lucene, which uses 1 / sqrt(numFieldTerms) as the length normalisation factor.

https://www.compose.com/articles/how-scoring-works-in-elasticsearch/ https://theaidigest.in/how-does-elasticsearch-scoring-work/

Just as a quick test, if I take 1 / sqrt(fieldLength), I get the following results:

[
  {
    id: 2,
    terms: [ 'sheep' ],
    score: 1.8946174879859907,
    match: { sheep: [Array] },
    title: 'Shaun the Sheep'
  },
  {
    id: 1,
    terms: [ 'sheep' ],
    score: 0.08434033477788275,
    match: { sheep: [Array] },
    title: 'Rams'
  },
  {
    id: 5,
    terms: [ 'sheep' ],
    score: 0.03596283958463321,
    match: { sheep: [Array] },
    title: 'Ringing Bell'
  },
  {
    id: 4,
    terms: [ 'sheep' ],
    score: 0.020629628616731104,
    match: { sheep: [Array] },
    title: 'Lamb'
  }
]

I get the same results even if I drop the title boosting factor. That’s actually exactly what I personally expect: the shorter fields should count more if they match unless I disadvantage them explicitly.

Problem solved?! Well, not really. What if I search for a highly specific sheep?

ms.search('chirin the sheep')

[
  {
    id: 2,
    terms: [ 'the', 'sheep' ],
    score: 4.537584326120562,
    match: { the: [Array], sheep: [Array] },
    title: 'Shaun the Sheep'
  },
  {
    id: 5,
    terms: [ 'chirin', 'the', 'sheep' ],
    score: 2.2902873329363285,
    match: { chirin: [Array], the: [Array], sheep: [Array] },
    title: 'Ringing Bell'
  },
  {
    id: 3,
    terms: [ 'the' ],
    score: 1.09077315757252,
    match: { the: [Array] },
    title: 'Silence of the Lambs'
  },
  {
    id: 4,
    terms: [ 'the', 'sheep' ],
    score: 0.2166111004756766,
    match: { the: [Array], sheep: [Array] },
    title: 'Lamb'
  },
  {
    id: 1,
    terms: [ 'sheep' ],
    score: 0.08434033477788275,
    match: { sheep: [Array] },
    title: 'Rams'
  }
]

I definitely wasn’t looking for Shaun! ‘Ringing Bell’ should be the top result here, because it is the only match for ‘chirin’. So what can we do? Taking cues from Lucene, it scores terms in query with a coordination mechanism. It effectively means the more term matches there are, the better the score should be. It uses matching terms / total terms as a weight factor for each document. This can also replace the 1.5 boost for OR queries. Hacking that into MiniSearch I get this:

[
  {
    id: 2,
    terms: [ 'the', 'sheep' ],
    score: 1.0445507364815925,
    match: { the: [Array], sheep: [Array] },
    title: 'Shaun the Sheep'
  },
  {
    id: 5,
    terms: [ 'chirin', 'the', 'sheep' ],
    score: 1.0298930944999127,
    match: { chirin: [Array], the: [Array], sheep: [Array] },
    title: 'Ringing Bell'
  },
  {
    id: 3,
    terms: [ 'the' ],
    score: 0.21087593054514742,
    match: { the: [Array] },
    title: 'Silence of the Lambs'
  },
  {
    id: 4,
    terms: [ 'the', 'sheep' ],
    score: 0.09627160021141183,
    match: { the: [Array], sheep: [Array] },
    title: 'Lamb'
  },
  {
    id: 1,
    terms: [ 'sheep' ],
    score: 0.028113444925960917,
    match: { sheep: [Array] },
    title: 'Rams'
  }
]

Almost there (1.04 vs 1.03), but not quite yet…

Lucene also uses the inverse document frequency of each term in the query as a factor for determining how unique a term is. I have not tested this (it touches more code in MiniSearch), but my guess is this would raise the score of ‘Ringing Bell’ to the top position because of the uniqueness of the term ‘chirin’.

So, my question to you is this: would you be open to revising the scoring mechanism to be closer to what Lucene uses? I believe it could solve some practical issues with the current document scoring.

If you do, maybe we should collect some test sets which are realistic enough, but also small enough to be able to judge the scoring from the outside.

Looking forward to any thoughts you may have on this!

Issue Analytics

State:
Created 2 years ago
Comments:15 (15 by maintainers)

Top GitHub Comments

3reactions

lucaongcommented, Feb 23, 2022

I released v5.0.0-beta1

1reaction

lucaongcommented, Mar 11, 2022

@rolftimmermans I created #142 to collect feedback on v5.0.0 beta. So far, the applications that my teams maintain show very noticeable improvements too, and no issue was reported. I did adjust boosting in one case to be a bit less aggressive.

I will close this issue in favor of the newly created one.

Top Results From Across the Web

The Problem with Scoring – An Article by Fundmetric

The problem with scoring is that it introduces bias, it may be biased based on wealth, it may be based on professional expertise,...

Problems with scoring methods and ordinal scales in risk ...

Problems with scoring methods and ordinal scales in risk assessment. May 2010; Ibm Journal of Research and Development 54(3):2.

Issue Scoring, Interactivity, Uses of the Matrix

The more information provided on how issues are scored, including both the type of data and the methodology used to collect and aggregate...

Challenges in Understanding, Scoring, and Comparing ...

Outcomes researchers who work with patients' reports (often called patient-reported outcomes, or PROs) are accustomed to skepticism.

Problems in Scoring, Agreement among Raters, and Internal ...

ONE CRITICISM OF some experimental tests published by Sheridan Supply, Educational Testing Service, and Guil ford concerns the lack of agreement in scoring...