question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Option to only return results that match all tokens

See original GitHub issue

For the purposes of this example you can assume I have set the fuzziness threshold to 0.

Given I have a list of items like this:

var items = [
  'large red shirt',
  'large green shirt',
  'large blue shirt',
  'medium red shirt',
  'medium green shirt',
  'medium blue shirt',
  'small red shirt',
  'small green shirt',
  'small blue shirt',
  'large red trousers',
  'large green trousers',
  'large blue trousers',
  'medium red trousers',
  'medium green trousers',
  'medium blue trousers',
  'small red trousers',
  'small green trousers',
  'small blue trousers',
  'large red socks',
  'large green socks',
  'large blue socks',
  'medium red socks',
  'medium green socks',
  'medium blue socks',
  'small red socks',
  'small green socks',
  'small blue socks'
];

I would like a search of large shirt to return the 3 results that match both words in the input:

[
  'large red shirt',
  'large green shirt',
  'large blue shirt'
]

A default search returns 0 results.

Enabling thetokenize option to search for individual words does successfully return these 3 results, however it also returns the other 12 large or shirt items. I am only interested in items that match both search tokens.

Could a matchAllTokens option be added to Fuse to achieve this?

https://jsbin.com/hifegi/17/edit?js,console

Issue Analytics

  • State:closed
  • Created 7 years ago
  • Reactions:3
  • Comments:7 (4 by maintainers)

github_iconTop GitHub Comments

6reactions
keeganstreetcommented, Aug 3, 2016

Hey @krisk,

Thanks for taking the time to reply (and for building the library of course!)

I do appreciate that Fuse is all about string approximation, and I am using it for this feature. In my actual implementation I have not set the threshold to 0. I have just used that as an example in this ticket because I thought taking fuzziness out of the equation would make this issue clearer.

Maybe a more real-world example would better explain what I mean.

Say I have this list of companies, and I want to filter the list down based on a user entered search term:

AustralianSuper - Corporate Division
Aon Master Trust - Corporate Super
Promina Corporate Superannuation Fund
Workforce Superannuation Corporate
IGT (Australia) Pty Ltd Superannuation Fund

A search input of “Australia” will return 2 results.

And a search input of “corporate” will return 4 results.

A search input of “Australia corporate”, which in the user’s mind is a more specific search term, will return all 5 results. It seems counter-intuitive for a more specific search term (“Australia corporate”) to return more results than a less specific search term (“Australia”).

I understand that this is useful behaviour in some use-cases, because we may want Fuse to uncover more results even if some of the tokens don’t match. But in this use case, we want to reduce the number of results as more tokens are provided as input.

Here’s a demo: https://jsbin.com/zoposik/4/edit?js,console

Also its not really important, but you may have been mistaken about the threshold parameter not having an effect when tokenize is true. On https://jsbin.com/pehixamoba/edit?js,console, this returns 15 results:

fuse = new Fuse(items, {
  threshold: 0,
  tokenize: true
});
result = fuse.search('large shirt');

And this returns 27:

fuse = new Fuse(items, {
  threshold: 1,
  tokenize: true
});
result = fuse.search('large shirt');
3reactions
kriskcommented, Aug 5, 2016

@keeganstreet, your example illustrates the problem quite nicely. All the other matched results might be superfluous.

Very well, I’m sold. I’ll add some logic + option to Fuse.js which would address the issue. I will post updates on this thread.

And about this:

Also it’s not really important, but you may have been mistaken about the threshold parameter not having an effect when tokenize is true.

You’re absolutely right. Mea culpa. I had made this change a while ago, and had forgotten about it😅

Read more comments on GitHub >

github_iconTop Results From Across the Web

ElasticSearch - Return only matched token not whole string ...
How can I get list of matching tokens only not whole string when querying index. Say, we have to query a field which...
Read more >
How do I build a query such that each token in a document ...
Unfortunately, I need each token of the Store_Name field to be matched. I need the following behavior: Query: Square Steakhouse Result: Match
Read more >
Rule-based matching · spaCy Usage Documentation
By default, the matcher will only return the matches and not do anything else, like merge entities or assign labels. This is all...
Read more >
Match regular expression (case sensitive) - MATLAB regexp
This MATLAB function returns the starting index of each substring of str that matches the character patterns specified by the regular expression.
Read more >
Pattern Matching
All matches (combined with no spaces): Returns all values as a single-valued token. Example: The input 123 456 789 with the pattern [0-9]+...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found