question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Using invertedIndex for autocomplete

See original GitHub issue

Not so much an issue with lunr, which is great! More a quick try to get ideas going…

In a shell with the jq utility I pull my terms from the lunr index in advance: jq '[.index.invertedIndex[][0]|scan("^\\w{3,}")]|unique' index.json > iindex.json I can feed that to http://api.jqueryui.com/autocomplete/ widget like below

   function normalize(str) {
      var map = { "ä": "a", "ö": "o", "ü": "u", "ß": "ss" };
      return str.replace(/[^A-Za-z0-9]/g,
         function(a) { return map[a]||a; }
      );
   }
   $.getJSON('iindex.json', function (tags) {
      $('#query').autocomplete({
         minLength: 3,
         source: function(inp, out) {
            var t = normalize(inp.term);
            var r = $.ui.autocomplete.filter(tags, t);
            out(r);
         }
      });
   });

Not fully nice, but works acceptably so far. Now truly nice would be to create the autocomplete index on the client and have the term to match processed by the indexer instead of that crude normalizer above.

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

4reactions
chrisbartleycommented, Jan 16, 2019

First, a HUGE thanks to both @hungerburg and @olivernn for this. I combined both suggestions and it’s working great. For anyone wanting to do the same, this is what worked for me…

I’m indexing like this:

// Store unstemmed term in the metadata.  See:
// https://github.com/olivernn/lunr.js/issues/287#issuecomment-322573117
// https://lunrjs.com/guides/customising.html#token-meta-data
const storeUnstemmed = function(builder) {

   // Define a pipeline function that keeps the unstemmed word
   const pipelineFunction = function(token) {
      token.metadata['unstemmed'] = token.toString();
      return token;
   };

   // Register the pipeline function so the index can be serialised
   lunr.Pipeline.registerFunction(pipelineFunction, 'storeUnstemmed');

   // Add the pipeline function to both the indexing pipeline and the searching pipeline
   builder.pipeline.before(lunr.stemmer, pipelineFunction);

   // Whitelist the unstemmed metadata key
   builder.metadataWhitelist.push('unstemmed');
};

const index = lunr(function() {
   this.use(storeUnstemmed);
   ...
});

And modified the autocomplete function suggested by @hungerburg to use the unstemmed words like this:

autoComplete(searchTerm) {
   const results = this._index.query(function(q) {
      // exact matches should have the highest boost
      q.term(searchTerm, { boost : 100 })
      // wildcard matches should be boosted slightly
      q.term(searchTerm, {
         boost : 10,
         usePipeline : true,
         wildcard : lunr.Query.wildcard.LEADING | lunr.Query.wildcard.TRAILING
      })
      // finally, try a fuzzy search, without any boost
      q.term(searchTerm, { boost : 1, usePipeline : false, editDistance : 1 })
   });
   if (!results.length) {
      return "";
   }
   return results.map(function(v, i, a) { // extract unstemmed terms
      const unstemmedTerms = {};
      Object.keys(v.matchData.metadata).forEach(function(term) {
         Object.keys(v.matchData.metadata[term]).forEach(function(field) {
            v.matchData.metadata[term][field].unstemmed.forEach(function(word) {
               unstemmedTerms[word] = true;
            });
         });
      });
      return Object.keys(unstemmedTerms);
   }).reduce(function(a, b) { // flatten
      return a.concat(b);
   }).filter(function(v, i, a) { // uniq
      return a.indexOf(v) === i;
   });
}

Thanks!

3reactions
olivernncommented, Aug 7, 2017

Sorry for the late reply.

You could definitely wrap that normalise function up into a lunr plugin. There is a similar project, lunr-unicode-normalizer, but I don’t think it has been updated for lunr 2.

As for autocomplete, I need to get round to actually putting a demo of this together, but this is what I’ve been suggesting to people.

idx.query(function (q) {
  // exact matches should have the highest boost
  q.term(searchTerm, { boost: 100 })

  // prefix matches should be boosted slightly
  q.term(searchTerm, { boost: 10, usePipeline: false, wildcard: lunr.Query.wildcard.TRAILING })

  // finally, try a fuzzy search, without any boost
  q.term(searchTerm, { boost: 1, usePipeline: false, editDistance: 1 })
})

I disable the pipeline to prevent stemming getting in the way, you would have to experiment if this makes sense for your use case, especially if you wanted to add the unicode normalising plugin.

Additionally, when using the query method lunr won’t be doing any tokenisation for you, you can either handle this your self, or borrow the lunr.tokenizer directly, or its regex to split into individual tokens.

Read more comments on GitHub >

github_iconTop Results From Across the Web

A detailed comparison between autocompletion strategies in ...
With Elasticsearch's inverted index, this is fairly straightforward — return all documents that have android in the “platform” field.
Read more >
The Awesome Power of the Inverted Index - Lucidworks
The inverted index is a wonder that helps find and make sense of information buried in mounds of data, text and binaries.
Read more >
Understanding the Inverted Index in Elasticsearch
The purpose of an inverted index, is to store text in a structure that allows for very efficient and fast full-text searches. When...
Read more >
How to structure an index for type ahead for extremely large ...
I've used this data structure for the exact auto-complete ... to tag each indexed record with a relevance score, which you can then...
Read more >
Autocomplete with Elasticsearch - Part 2: Index-Time Search ...
The inverted index needs to store more data. We highly recommended reading the Definitive Guide, as there are additional examples, e.g. for zip ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found