Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

How to get all documents?

See original GitHub issue

Is there a way to get all documents returned as results?

For example:

miniSearch.search("")

returns an empty array, but I’m looking for a way to get the opposite, all documents.

A use case I have is that I want to only filter by a numeric range in some cases. Something like this:

// get all documents with val property >= minVal
miniSearch.search('', {
    filter: (result) => {
        return result.val >= minVal
    }
})

but that currently returns nothing since no results are given to the filter.

I know it’s not the best use of this library as mentioned here - https://github.com/lucaong/minisearch/issues/119#issuecomment-1027726138 however it’s just one of several scenarios I’m using it for & would be great to be able to leverage it as well for this.

& awesome library btw 🙌 🙏

Issue Analytics

State:
Created a year ago
Comments:8 (4 by maintainers)

Top GitHub Comments

2reactions

lucaongcommented, May 2, 2022

Implement the option with a callback instead of a boolean flag.

It’s unfortunately more complicated than that. For example, how should documents be sorted? It would seem reasonable to return them in the original order, but what if one defines a boostDocument function? Then it makes more sense to compute the boost for each document and re-sort them. But since the original list is static, a smart developer would prefer to pre-sort the list only once, and skip the search-time boosting calculation when returning all documents.

Similarly, since MiniSearch returns an array of SearchResult, not documents, when returning all results it would have to first map each document into a search result. But depending on the use case, developers might map results back to documents (like I did in my example before). In that case, it’s a lot more efficient to avoid mapping to SearchResult[] in the first place (especially as it maps the whole collection, potentially tens of thousands documents).

Moreover, at the moment MiniSearch does not keep a reference to the original collection of documents, so it cannot return it. This is by choice: it is possible to make some documents searchable without storing the document itself in memory.

Of course, it is theoretically possible to implement options for each of these choices, but that would make the API surface huge, and hard to learn. Instead, these details are better defined in code. The reason why code is better than configuration in this case, is that configuration is something that has to be learned for each and every library, while code is general purpose: for a configuration option to be ergonomic, it has to save the developer a non-trivial amount of code or cognitive load. If it generates more open questions, it is not worth, because learning all the implications takes more effort than taking control of the issue with code.

Hmm, I mean that’s why people install an npm package in the first place. I don’t want to write code.

I would say, one does not want to write code at the wrong level of abstraction. What I mean is: even when using a library, one does have to write code. The point is that one normally prefers to avoid writing code that pertains to the internal details of the problem solved by the library, and instead focus on code pertaining to the higher level goal of the application.

Therefore, a library has to choose its own boundaries and goals. MiniSearch, as its design document outlines, “enables developers to build [turn-key opinionated solutions] on top of its core API, but does not provide them out of the box.”. MiniSearch takes care, for example, of implementation details of the inverted index or of the document scoring, but it leaves to the developers the responsibility to write code that defines their specific full-text search problem.

It would be absolutely appropriate to build a library on top of MiniSearch that makes some of these decision and builds a higher level of abstraction. That would save developers from writing some code, but also restrict their options. For developers that have those specific needs, such library would facilitate things. MiniSearch itself though has to enable also developers that have different needs. In other words, your request is completely legitimate, it just lies outside of MiniSearch self-assigned boundaries of abstraction.

The discussion in the other package is quite big, indicating that a lot of people want that feature.

I understand and respect the fact that many people have this need. As a matter of fact, even some of my own apps have the same need. But apart from using MiniSearch in my production applications, I do not profit from MiniSearch: my motivation in maintaining it stems from the satisfaction of what I consider a well crafted piece of software. I am happy if more people use it, because it means that it is solving more problems than it was originally conceived for, but I would not sacrifice the solidity of its design for popularity. By open-sourcing my library, I get to keep the satisfaction of crafting software the way I consider best, without having to sacrifice it to chase more users. Users, in turn, get the freedom to use my library, and to create applications or other libraries on top of it.

In sum, I do agree with you that yours is a common need. My opinion though, is that such need is better served by writing some thin layer of code, like the example I provided, than by adding more configuration options. But it is perfectly reasonable to disagree with that, and such thin layer can be packaged in a library for convenience.

1reaction

samuelstroscheincommented, May 2, 2022

@lucaong

Thank you for the in-depth reply and explanation. I overread the stated goal of “[…] enables developers to build [turn-key opinionated solutions] on top of its core API, but does not provide them out of the box” and was looking (expecting) a drop-in replacement for fuse.js.

On a side note, I have a question regarding your i18n workflow at megaloop. Can you send me a DM on Twitter or via email?