Implement partially mapped bitmaps: hybrid Java heap + off heap
See original GitHub issueThis is an issue raised by @kishoreg and related to how Pinot uses Roaring bitmaps.
Currently, we support ImmutableRoaringBitmap
… MutableRoaringBitmap
and RoaringBitmap
. The ImmutableRoaringBitmap
class stores the containers off-heap (possibly on disk). Other classes (MutableRoaringBitmap
and RoaringBitmap
) store the data on Java’s heap.
However, there are instances where we would like just one container to be stored in memory and to be mutable, whereas the rest of the containers should be off-heap and immutable.
I think that we are lucky in that this can be implemented in a sane manner, without having to go crazy and risk lots of bugs. However, I have not yet come up with a design that I like, so I want to open this up for discussion. Another issue is that I am not 100% clear on what would best serve Pinot’s needs.
A MutableRoaringBitmap
is implemented as a derived class of ImmutableRoaringBitmap
.
The key implementation concept is the attribute PointableRoaringArray highLowContainer
in the class ImmutableRoaringBitmap
. We currently have two types of PointableRoaringArray
. The base ImmutableRoaringBitmap
class uses an ImmutableRoaringArray
. Unsurprisingly, the MutableRoaringBitmap
uses a MutableRoaringArray
.
If one looks at the MutableRoaringBitmap
class itself, it is rather thin. Maybe 1500 lines, but a lot of it may not be needed in a hybrid model, except maybe for an add
method. That is, we inherit many useful methods from ImmutableRoaringBitmap
.
I think that one possibility would be to implement something like a HybridRoaringBitmap
that would be an instance of the base base ImmutableRoaringBitmap
. It would have an ImmutableRoaringArray
but also a MappeableContainer
.
It might behave much like a MutableRoaringBitmap
, except that any attempt at modifying a mapped container would generate an exception. I am not sure how you make sure that there is only ever just one mutable container.
I will stop here and open up the discussion.
See also https://github.com/RoaringBitmap/RoaringBitmap/issues/193 which might ever offer a viable alternative.
Issue Analytics
- State:
- Created 6 years ago
- Comments:26 (26 by maintainers)
@blasd
Part of the code redundancy was motivated by performance. A MutableRoaringBitmap is much slower than a RoaringBitmap… There is a pretty severe penalty for going through a ByteBuffer… And you certainly do not want to use ByteBuffer objects if you can avoid them. ByteBuffers in Java are not nice. I could elaborate if you want. Even just holding an array in a ByteBuffer is not nice… as a ByteBuffer tends to use more overhead memory than we’d like, all things considered.
Then there is branching (in its many forms): we try to avoid megamorphic calls if we can.
So while I would not be surprised if you can get the same functionality and the same performance with half the code… one should certainly not discard performance concerns when considering code improvements.
@kishoreg The general idea is that we can already, easily, distinguish between the mapped and unmapped containers. Of course, mixing off-heap and heaped containers means that we need to check whenever we try to modify a container whether it is allowed. My proposal is to throw an exception when we do try to modify an off-heap container. Then it is easy to add a method that goes through the unmapped containers and does something with them. There is a lot of code, but it is all copied and pasted from
MutableRoaringBitmap
. (I find that tolerating copy-and-paste is sometimes a good engineering compromise.)