question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Match all tokens should look in different keys as well.

See original GitHub issue

From what I understand using tokenize with match all tokens currently forces a match of all tokens on the same key. Would it be possible to force a match of all tokens across any keys of a single match?

e.g.

     {
        title: "Old Man's War",
        author: {
          firstName: "John",
          lastName: "Scalzi"
        }
     }

searching for Old War will match this record, but Old War John will not.

This would be useful in searches where you want two or more attributes of a record to match. i.e.

{
    jobtitle: 'Lawyer',
    country: 'France'
},
{
    jobtitle: 'Developer',
    country: 'France'
}

There should be a way for a search to return only the first record when search for “lawyer france”

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Reactions:20
  • Comments:14 (1 by maintainers)

github_iconTop GitHub Comments

4reactions
titeniscommented, Feb 22, 2022

Is this lib not maintained anymore? I cant believe that 4 years passed and this is still not implemented, this issue is even not open lol. I think proposed solutions, when you have to assemble keyword field yourself, really defeats the purpose of consuming this lib at all.

3reactions
0xdevaliascommented, May 28, 2021

For anyone who ends up here from a google search or otherwise, we ended up taking a different approach in https://github.com/sparkletown/sparkle/pull/1460 (thanks to @yarikoptic’s awesome work debugging, exploring, and refining this)

We basically split our search query using regex (tokeniseStringWithQuotesBySpaces), to tokenise each individual word, but keep words that are between " and " as a single token):

/**
 * Split the provided string by spaces (ignoring spaces within "quoted text") into an array of tokens.
 *
 * @param string
 *
 * @see https://stackoverflow.com/a/16261693/1265472
 *
 * @debt Depending on the outcome of https://github.com/github/codeql/issues/5964 we may end up needing to change
 *   this regex for performance reasons.
 */
export const tokeniseStringWithQuotesBySpaces = (string: string): string[] =>
  string.match(/("[^"]*?"|[^"\s]+)+(?=\s*|\s*$)/g) ?? [];

(Note: Please check https://github.com/github/codeql/issues/5964 as the regex may have a ReDoS vulnerability, but it also might just be a false positive in the CodeQL scanner)

With our standard Fuse config:

      new Fuse(filteredPosterVenues, {
        keys: [
          "name",
          "poster.title",
          "poster.authorName",
          "poster.categories",
        ],
        threshold: 0.2, // 0.1 seems to be exact, default 0.6: brings too distant if anyhow related hits
        ignoreLocation: true, // default False: True - to search ignoring location of the words.
        findAllMatches: true,
      }),

But then use our tokeniseStringWithQuotesBySpaces tokeniser + customised Fuse query (using $and to join each of our tokens, then $or for the different fields) for the search:

const tokenisedSearchQuery = tokeniseStringWithQuotesBySpaces(
  normalizedSearchQuery
);

if (tokenisedSearchQuery.length === 0) return filteredPosterVenues;

return fuseVenues
  .search({
    $and: tokenisedSearchQuery.map((searchToken: string) => {
      const orFields: Fuse.Expression[] = [
        { name: searchToken },
        { "poster.title": searchToken },
        { "poster.authorName": searchToken },
        { "poster.categories": searchToken },
      ];

      return {
        $or: orFields,
      };
    }),
  })
  .map((fuseResult) => fuseResult.item);

This seems to work pretty effectively for our needs from my testing of it all today.

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to match tokens with a dictionary key and get ...
I want to match all the tokens of a row with keys of dictionary and get matched keys and value like below. output:...
Read more >
Match Tokens and Match Keys - Informatica Documentation
Match tokens include: Match keys, which are fixed-length, compressed strings consisting of encoded values built from all of the columns in the Fuzzy...
Read more >
Rule-based matching · spaCy Usage Documentation
Find phrases and tokens, and match entities.
Read more >
Ultimate Regex Cheat Sheet - KeyCDN Support
This guide provides a regex cheat sheet as well as example use-cases that you can use as a reference when creating your regex...
Read more >
JSON Web Key Sets - Auth0
Currently, Auth0 signs with only one JWK at a time; however, it is important to assume this endpoint could contain multiple JWKs. As...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found