Serialize big indexes (get rid of TokenStore?)
See original GitHub issueHello there,
I’m building a browser based application (without a webserver). I’ll have multiple documents to index (in my case, about 700) and the cumulated weight of those documents is about 5 MB.
For now, my application rebuilds the index every time the webpage is displayed, but it takes about 40 seconds to build the full index, which is very long.
I want to store the index in the browser but I can’t even serialize it. Here is what I get when I try to JSON.stringify()
my index:
Uncaught RangeError: Maximum call stack size exceeded
After a short investigation, the problem seems to come from JSON.stringify()
which can’t handle big object. In fact, after a heap snapshot, it occurs that my TokenStore is about ~ 70 MB…!
I will not even try to store such a big object in a browser, but maybe I’m looking in the wrong way here. Maybe there is a way to store the index without the TokenStore and rebuild it somehow?
Thank you!
Issue Analytics
- State:
- Created 9 years ago
- Comments:11 (6 by maintainers)
Top GitHub Comments
Three years later there is a major new version of Lunr, 2.x, which has a much simpler serialised structure. Upgrading to this latest version should hopefully alleviate any issues with trying to serialise the large, deeply nested, data structure that Lunr 0.x and 1.x have.
You can play with a live demo at https://tabletree.io - I ended up rolling a simplified search optimized for speed and index wire size. The index is 1MB gzipped and the searches take <20ms, so you get nice live updates as you type.
Client-side archiving might get you 10-20% wire size boost compared to gzip, but you end up spending more time decompressing the data, so I figured I’d just make the source data more gzip-friendly. Building the search index on the client-side is a non-starter as the corpus is a hundred megs in size – need to build a compressed index off-line and transfer just that to the client.