Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Match all tokens should look in different keys as well.

See original GitHub issue

From what I understand using tokenize with match all tokens currently forces a match of all tokens on the same key. Would it be possible to force a match of all tokens across any keys of a single match?

e.g.

     {
        title: "Old Man's War",
        author: {
          firstName: "John",
          lastName: "Scalzi"
        }
     }

searching for Old War will match this record, but Old War John will not.

This would be useful in searches where you want two or more attributes of a record to match. i.e.

{
    jobtitle: 'Lawyer',
    country: 'France'
},
{
    jobtitle: 'Developer',
    country: 'France'
}

There should be a way for a search to return only the first record when search for “lawyer france”

Issue Analytics

State:
Created 5 years ago
Reactions:20
Comments:14 (1 by maintainers)

Top GitHub Comments

4reactions

titeniscommented, Feb 22, 2022

Is this lib not maintained anymore? I cant believe that 4 years passed and this is still not implemented, this issue is even not open lol. I think proposed solutions, when you have to assemble keyword field yourself, really defeats the purpose of consuming this lib at all.

3reactions

0xdevaliascommented, May 28, 2021

For anyone who ends up here from a google search or otherwise, we ended up taking a different approach in https://github.com/sparkletown/sparkle/pull/1460 (thanks to @yarikoptic’s awesome work debugging, exploring, and refining this)

We basically split our search query using regex (tokeniseStringWithQuotesBySpaces), to tokenise each individual word, but keep words that are between " and " as a single token):

https://github.com/sparkletown/sparkle/blob/c0e7e40fe7a18db916eae9c48fc4e966f099642e/src/utils/text.ts#L1-L12

/**
 * Split the provided string by spaces (ignoring spaces within "quoted text") into an array of tokens.
 *
 * @param string
 *
 * @see https://stackoverflow.com/a/16261693/1265472
 *
 * @debt Depending on the outcome of https://github.com/github/codeql/issues/5964 we may end up needing to change
 *   this regex for performance reasons.
 */
export const tokeniseStringWithQuotesBySpaces = (string: string): string[] =>
  string.match(/("[^"]*?"|[^"\s]+)+(?=\s*|\s*$)/g) ?? [];

(Note: Please check https://github.com/github/codeql/issues/5964 as the regex may have a ReDoS vulnerability, but it also might just be a false positive in the CodeQL scanner)

With our standard Fuse config:

https://github.com/sparkletown/sparkle/blob/c0e7e40fe7a18db916eae9c48fc4e966f099642e/src/hooks/posters.ts#L72-L87

      new Fuse(filteredPosterVenues, {
        keys: [
          "name",
          "poster.title",
          "poster.authorName",
          "poster.categories",
        ],
        threshold: 0.2, // 0.1 seems to be exact, default 0.6: brings too distant if anyhow related hits
        ignoreLocation: true, // default False: True - to search ignoring location of the words.
        findAllMatches: true,
      }),

But then use our tokeniseStringWithQuotesBySpaces tokeniser + customised Fuse query (using $and to join each of our tokens, then $or for the different fields) for the search:

https://github.com/sparkletown/sparkle/blob/c0e7e40fe7a18db916eae9c48fc4e966f099642e/src/hooks/posters.ts#L90-L115

const tokenisedSearchQuery = tokeniseStringWithQuotesBySpaces(
  normalizedSearchQuery
);

if (tokenisedSearchQuery.length === 0) return filteredPosterVenues;

return fuseVenues
  .search({
    $and: tokenisedSearchQuery.map((searchToken: string) => {
      const orFields: Fuse.Expression[] = [
        { name: searchToken },
        { "poster.title": searchToken },
        { "poster.authorName": searchToken },
        { "poster.categories": searchToken },
      ];

      return {
        $or: orFields,
      };
    }),
  })
  .map((fuseResult) => fuseResult.item);

This seems to work pretty effectively for our needs from my testing of it all today.

Top Results From Across the Web

How to match tokens with a dictionary key and get ...

I want to match all the tokens of a row with keys of dictionary and get matched keys and value like below. output:...

Match Tokens and Match Keys - Informatica Documentation

Match tokens include: Match keys, which are fixed-length, compressed strings consisting of encoded values built from all of the columns in the Fuzzy...

Rule-based matching · spaCy Usage Documentation

Find phrases and tokens, and match entities.

Ultimate Regex Cheat Sheet - KeyCDN Support

This guide provides a regex cheat sheet as well as example use-cases that you can use as a reference when creating your regex...

JSON Web Key Sets - Auth0

Currently, Auth0 signs with only one JWK at a time; however, it is important to assume this endpoint could contain multiple JWKs. As...