question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Configure maxEditDistance and maxSequentialEdits for fuzzy searches

See original GitHub issue

The current values for the maxEditDistance (4) and maxSequentialEdits (1) arguments of the FuzzyMatchQueryPart constructor as used in QueryParser.CreateWordQueryPart() yield too many results that are nothing like the search term for my taste - especially for short search terms.

E.g. if I search an English text for ?term with maxEditDistance = 4 I get matches like very, here, were or seems which occur quite often - creating a lot of noise in the search result.

It would be nice to have a way of configuring those values

  1. on a FullTextIndex level via the FullTextIndexBuilder using something like .WithQueryParser(o => o.FuzzyMaxEditDistance(2).FuzzyMaxSequentialEdits(0))

  2. on a Query level 2.1 by either intercepting the query parsing using some hook, e.g. .WithQueryParser(o => o.FuzzyMaxEditDistance(someContext => 2).FuzzyMaxSequentialEdits(someContext => 0)) 2.2 and/or by supplying the values with the query (similar to the nearness syntax), e.g. 2.2.1 ?2,0term (both integers comma-separated after the ? with the first int being maxEditDistance and the second one maxSequentialEdits) or 2.2.2 2?0term or 2?term or ?0term (maxEditDistance before the ? and maxSequentialEdits after)

  3. dynamically based on the length of the search term to account for shorter words, e.g.

.WithQueryParser(o => o
     // a maximum of a fourth of the letters may differ
    .FuzzyMaxEditDistance(termLength => termLength / 4)
     // with a maximum of a tenth of the letters edited in sequence
    .FuzzyMaxSequentialEdits(termLength => termLength / 10))

Do you have any plans for implementing something like this - one way or the other? Please consider my syntax suggestions above as just that; I’m not opinionated on them. I only wanted to get across how I’d like to use the API to get fewer results while expanding on the existing API where I would expect it.

Thank you for your work and compliments on the API design and documentation! This library is exceptionally easy to use for its complexity.

Issue Analytics

  • State:closed
  • Created a year ago
  • Reactions:2
  • Comments:7 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
h0lgcommented, Oct 8, 2022

In version 3.4

  • the query syntax now works as I would have expected it without reading the manual and
  • the default calculation for max sequential edits works better for short fuzzy search terms.

Also, the doco links work again 😃

Thanks a bunch ❤️

1reaction
mikegoatlycommented, Oct 8, 2022

Thanks for the feedback! This code commit and docs commit should address all of these points .

  • The query syntax to support the comma being omitted, i.e. ?2?term will be allowed.
  • The default calculation for max sequential edits will change from termLength / 4 to termLength < 4 ? 1 : termLength / 4 - yes this was an unintended consequence of the change! 😃
  • All the docs have been updated and links fixed
Read more comments on GitHub >

github_iconTop Results From Across the Web

Fuzzy query | Elasticsearch Guide [8.9]
To find similar terms, the fuzzy query creates a set of all possible variations, or expansions, of the search term within a specified...
Read more >
Fuzzy Search in Solr
With max edit distance 2 you can have up to 2 insertions, deletions or substitutions. The score for each match is based on...
Read more >
Fuzzy searches
A fuzzy search searches for text that matches a term closely instead of exactly. Fuzzy searches help you find relevant results even when...
Read more >
What Is Fuzzy Matching and How to Use It Correctly
Have you sound yourself asking the question, “What is fuzzy matching?” Fuzzy matching allows you to identify non-exact matches of your ...
Read more >
OpenSearch Fuzzy Search: How to Optimize & More
In this article, we will discuss how to optimize fuzzy search in OpenSearch to improve search performance and accuracy.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found