Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

searching in a field for an exact phrase containing spaces

See original GitHub issue

I have tags on all my web pages, and have built the lunr index with the tags field. Some tags are multiword phrases. If I search for tags:hand drawn maps I get completely wrong result. However, if I search for tags:hand* I get the correct result. I have tried tags:+hand +drawn +maps and tags:"hand drawn maps" but without any success. Suggestions?

update: I should add that I make my index and conduct my query like so

    idx = lunr(function () {
        this.field('title', { boost: 10 }),
        this.field('tags'),
        this.field('body'), { boost: 20 },
        this.field('created'),
        this.ref('file'),

        pages.forEach(function (doc) {
            this.add(doc)
        }, this)
    });

    searchResult = idx.search(q).map(function(result) {
            return {
                ref : result.ref,
                disp : result.ref.replace(/-/g, ' ')
            }
        });

Issue Analytics

State:
Created 5 years ago
Comments:5 (2 by maintainers)

Top GitHub Comments

8reactions

olivernncommented, Aug 6, 2018

@icidasset yeah, sorry I didn’t explain that well, and the relevant document is a bit hidden.

Lunr uses a lunr.Token internally, these are mostly just wrappers around a string. lunr.tokenizer is used to create lists of tokens from the fields of the documents being indexed.

If a field is an array, Lunr assumes that the items in the array are already ‘tokens’, for everything else it assumes it has to split the field into tokens itself. It does this splitting on whitespace (among other characters).

So, when you pass an ['foo bar baz'] Lunr assumes you have already done the splitting and the token is "foo bar baz", when you just pass "foo bar baz" it does the splitting it self and you get "foo", "bar", "baz".

6reactions

olivernncommented, Aug 1, 2018

When performing a search using lunr.Index#search the query string you use is parsed into a lunr.Query object. By default the parser assumes that whitespace indicates a term boundary, that is a search for “foo bar” is interpreted as a search for the terms foo and bar. What you want is a search for the term foo bar.

You can bypass the query parsing entirely by using the lunr.Index#query method which allows you to specify the term exactly how you want. Alternatively you can continue using lunr.Index#search but escape the spaces in the query string. Finally you could use wildcards to match the spaces, though this will match any character so is probably not a great solution.

I have put together a fiddle showing the three approaches. I think using lunr.Index#query is probably the best choice.