Faceted Search: Why are all facets evaluated for every call?
See original GitHub issueI’m playing around with the experimental faceted search feature (see #413 and #421) - looks great, pretty close to what I’m looking for. I basically wanted to use something like this for running aggregations on my stored entities, basically like SQL’s GROUP BY functionality. While it does what I’m looking for, it seems to be rather inefficient in how it prepares the results.
Consider this entity class (omitted constructors and getters/setters):
public class Foo {
@JsonApiId
private in id;
@Facet
private int accountID;
@Facet
private int companyID;
@Facet
private String currency;
}
I want to be able to group by either one of the three attributes, so basically getting a count of all Foo instances grouped by either account, company, or currency.
Crnk provides this handy resource that allows to specify a specific grouping: http://localhost:8080/api/facet/foo_currency
This returns the results in the form that I expect them to be in:
{
data: {
id: "foo_currency",
type: "facet",
values: {
EUR: {
label: "EUR",
value: "EUR",
filterSpec: {
path: "currency",
operator: "EQ",
value: "EUR",
expression: null
},
count: 12
},
GBP: {
label: "GBP",
value: "GBP",
filterSpec: {
path: "currency",
operator: "EQ",
value: "GBP",
expression: null
},
count: 15
}
},
name: "currency",
groups: { },
resourceType: "foo",
labels: [
"EUR",
"GBP"
],
links: {
self: "http://localhost:8083/facet/foo_currency"
}
}
}
When stepping through the code to identify how I can bind this to my backend data store, I noticed that Crnk calls the FacetProvider.findValues(FacetInformation facetInformation, QuerySpec querySpec)
method multiple times, once for each defined facet. So in my case, it calls it once for accountID
, once for companyID
, and then a third time for currency
. All grouping results from the three queries are put into a result list, and then in a final step, only the foo_currency
item is kept, while the other two results are dropped.
This seems highly inefficient to me. I’m requesting one piece of information (group the Foo
instances by currency) and Crnk runs three separate grouping queries (accountID
, companyID
, currency
) for this. If we have a backend data store with a huge amount of data behind this, the current behavior will result in a couple of pretty expensive backend queries, with most of the results neither required nor used.
Within my own FacetProvider
’ s findValues
method, I did not see any means to understand where the call was originating from, so that I could potentially only run the one query that was requested (currency
), and returning empty results for the other ones that are not needed.
Am I missing something, or am I doing this wrong? Please let me know if you need more information about my use case.
(I’m really grateful for the facet support so far, and I understand that it’s experimental so far - thanks for building this functionality! As requested in the documentation, I’m trying to provide feedback here, please don’t take this as criticism…)
Issue Analytics
- State:
- Created 4 years ago
- Comments:5 (3 by maintainers)
Top GitHub Comments
can take care of it the coming days
yeah, like this it works. But addressing it directly by ID should be supported/fixed as well.