Support no forward index for column
See original GitHub issueCurrently a text column can be created without any forward index, which is useful when using the column only for filtering. In this situation, the raw (original) text data is not needed, only the text index (see https://github.com/apache/incubator-pinot/pull/6284/).
There are other situations for non-text columns where this same functionality is useful to reduce the size of the column. In our particular use case, we’re generating unique terms for a (large) string field, which we save as a multi-value STRING column. We need an inverted index for fast filtering, but we do not need the forward index, which (leaving aside the inverted index, which is built at load time) accounts for more than 80% of the total segment size.
@kishoreg suggested “having a empty forward Index reader impl” as a way of implementing this.
We could possible handle the configuration of this via a new noFwdIndexColumns
table config field, similar to the noDictionaryColumns
config setting.
There would be situations where specifying no forward index for a column would trigger a table config error, for example doing this for a metrics column (or so I assume).
I’m also not sure whether it would be valid to have a column that has no index/dictionary/forward index; does this mean ignore the field in the input data?
Issue Analytics
- State:
- Created 3 years ago
- Comments:13 (13 by maintainers)
Top GitHub Comments
@somandal - I think you may want to update user docs and open follow up issues for the pending work and link here.
Here’s a document which discusses the reload problem and how to solve it for forwardIndexDisabled columns. Please take a look and leave your feedback. cc @Jackie-Jiang @siddharthteotia @vvivekiyer
Just a note that a few details still need to be figured out and I will update the document as and when we figure them out.