Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Serialize big indexes (get rid of TokenStore?)

See original GitHub issue

Hello there,

I’m building a browser based application (without a webserver). I’ll have multiple documents to index (in my case, about 700) and the cumulated weight of those documents is about 5 MB.

For now, my application rebuilds the index every time the webpage is displayed, but it takes about 40 seconds to build the full index, which is very long.

I want to store the index in the browser but I can’t even serialize it. Here is what I get when I try to JSON.stringify() my index:

Uncaught RangeError: Maximum call stack size exceeded

After a short investigation, the problem seems to come from JSON.stringify() which can’t handle big object. In fact, after a heap snapshot, it occurs that my TokenStore is about ~ 70 MB…!

screen shot 2014-04-30 at 15 30 53

I will not even try to store such a big object in a browser, but maybe I’m looking in the wrong way here. Maybe there is a way to store the index without the TokenStore and rebuild it somehow?

Thank you!

Issue Analytics

State:
Created 9 years ago
Comments:11 (6 by maintainers)

Top GitHub Comments

3reactions

olivernncommented, Jul 3, 2017

Three years later there is a major new version of Lunr, 2.x, which has a much simpler serialised structure. Upgrading to this latest version should hopefully alleviate any issues with trying to serialise the large, deeply nested, data structure that Lunr 0.x and 1.x have.

2reactions

kigcommented, May 26, 2016

You can play with a live demo at https://tabletree.io - I ended up rolling a simplified search optimized for speed and index wire size. The index is 1MB gzipped and the searches take <20ms, so you get nice live updates as you type.

Client-side archiving might get you 10-20% wire size boost compared to gzip, but you end up spending more time decompressing the data, so I figured I’d just make the source data more gzip-friendly. Building the search index on the client-side is a non-starter as the corpus is a hundred megs in size – need to build a compressed index off-line and transfer just that to the client.