Support variable length Offline Dictionary Indexes for bytes, strings and maps to save on storage
See original GitHub issueWhat? Currently, the dictionary index for offline segments for bytes and string types uses Fixed-size storage for each value (by picking the size of the max element and padding the smaller elements with “0”). See org.apache.pinot.core.io.util.FixedByteValueReaderWriter The idea is to avoid padding and support storing byte arrays/strings/maps of different length while not slowing down the lookups much (obviously).
Why? Fixed size based storage is good for fast lookups but it’s very inefficient for the storage. For example, if we have a String column and the size of the biggest string value is 100 bytes but the average size is only 10 bytes, there is about 90% padding. The same thing applies for byte[], maps, etc.
How?
Currently, FixedByteValueReaderWriter
only writes the sorted values in the buffer directly starting from “0” offset and at fixed lengths. So, first Int is at index “0” and the second one at index “4”, etc. There is no additional metadata needed in the buffer.
The idea is to maintain the index of each element at the beginning of the buffer so that the element sizes needn’t be fixed. When looking up an element from the buffer, we first get it’s offset and then read the actual element. This means we do two reads from the buffer (first int offset and then the actual element) but the offset read should be fast enough so it shouldn’t slow down the overall operation that much.
Few things to note:
- If all values of a byte[], string or map column have fixed length, this approach rather adds storage overhead and one additional lookup and might not be preferable. Hence, we can have a flag/property at the column level to decide whether to use the VarLengthByteValueReaderWriter or not.
- Backward compatibility shouldn’t be broken, which means we need to introduce some kind of header into the buffer to be able to distinguish the on-disk storage format.
- Need to run Benchmarks to see the lookup overhead added by this approach.
- If possible, we should do some benchmarking to get the storage savings with the new approach so that we can make data-driven decisions.
Thanks @kishoreg for pointing this problem and brainstorming.
P.S: This was originally tracked in https://github.com/winedepot/pinot/issues/24
Issue Analytics
- State:
- Created 4 years ago
- Comments:12 (12 by maintainers)
Top GitHub Comments
@mcvsubbu yes that’s a possibility. Is that something you can work on? Might be easier for you to pull it out since you know the context. Once you pull it out @buchireddy can update this PR.
Merged this feature as part of https://github.com/apache/incubator-pinot/pull/4321 and hence closing the issue.