question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Implement partially mapped bitmaps: hybrid Java heap + off heap

See original GitHub issue

This is an issue raised by @kishoreg and related to how Pinot uses Roaring bitmaps.

Currently, we support ImmutableRoaringBitmapMutableRoaringBitmap and RoaringBitmap. The ImmutableRoaringBitmap class stores the containers off-heap (possibly on disk). Other classes (MutableRoaringBitmap and RoaringBitmap) store the data on Java’s heap.

However, there are instances where we would like just one container to be stored in memory and to be mutable, whereas the rest of the containers should be off-heap and immutable.

I think that we are lucky in that this can be implemented in a sane manner, without having to go crazy and risk lots of bugs. However, I have not yet come up with a design that I like, so I want to open this up for discussion. Another issue is that I am not 100% clear on what would best serve Pinot’s needs.

A MutableRoaringBitmap is implemented as a derived class of ImmutableRoaringBitmap. The key implementation concept is the attribute PointableRoaringArray highLowContainer in the class ImmutableRoaringBitmap. We currently have two types of PointableRoaringArray. The base ImmutableRoaringBitmap class uses an ImmutableRoaringArray. Unsurprisingly, the MutableRoaringBitmap uses a MutableRoaringArray.

If one looks at the MutableRoaringBitmap class itself, it is rather thin. Maybe 1500 lines, but a lot of it may not be needed in a hybrid model, except maybe for an add method. That is, we inherit many useful methods from ImmutableRoaringBitmap.

I think that one possibility would be to implement something like a HybridRoaringBitmap that would be an instance of the base base ImmutableRoaringBitmap. It would have an ImmutableRoaringArray but also a MappeableContainer.

It might behave much like a MutableRoaringBitmap, except that any attempt at modifying a mapped container would generate an exception. I am not sure how you make sure that there is only ever just one mutable container.

I will stop here and open up the discussion.

See also https://github.com/RoaringBitmap/RoaringBitmap/issues/193 which might ever offer a viable alternative.

Issue Analytics

  • State:open
  • Created 6 years ago
  • Comments:26 (26 by maintainers)

github_iconTop GitHub Comments

1reaction
lemirecommented, Dec 14, 2017

@blasd

Part of the code redundancy was motivated by performance. A MutableRoaringBitmap is much slower than a RoaringBitmap… There is a pretty severe penalty for going through a ByteBuffer… And you certainly do not want to use ByteBuffer objects if you can avoid them. ByteBuffers in Java are not nice. I could elaborate if you want. Even just holding an array in a ByteBuffer is not nice… as a ByteBuffer tends to use more overhead memory than we’d like, all things considered.

Then there is branching (in its many forms): we try to avoid megamorphic calls if we can.

So while I would not be surprised if you can get the same functionality and the same performance with half the code… one should certainly not discard performance concerns when considering code improvements.

0reactions
lemirecommented, Jan 9, 2018

@kishoreg The general idea is that we can already, easily, distinguish between the mapped and unmapped containers. Of course, mixing off-heap and heaped containers means that we need to check whenever we try to modify a container whether it is allowed. My proposal is to throw an exception when we do try to modify an off-heap container. Then it is easy to add a method that goes through the unmapped containers and does something with them. There is a lot of code, but it is all copied and pasted from MutableRoaringBitmap. (I find that tolerating copy-and-paste is sometimes a good engineering compromise.)

Read more comments on GitHub >

github_iconTop Results From Across the Web

Creating a Java off-heap in-memory database - Oracle Blogs
Here's the story: I developed a basic storage solution for a project using Java's MappedByteBuffer class coupled with a RandomAccessFile and the ...
Read more >
performance – Page 2 – Brave New Geek
The inverted bitmap technique builds on the observation that lookups are more frequent than updates and assumes that the search space is finite....
Read more >
Off-Heap memory reconnaissance - Brice Dutheil
In order to understand how the Java process memory is consumed, we need to use Native Memory Tracking ( -XX:NativeMemoryTracking=summary ) which ...
Read more >
The single-referent collector: Optimizing compaction for the common ...
justment involves inspecting every pointer in the heap and computing the target ... gramming languages, such as Java or C#, implement automatic memory....
Read more >
Decibel: The Relational Dataset Branching System - eScholarship
Decibel is implemented in Java, on top of the MIT SimpleDB database. ... maintaining local bitmap indexes for each of the fragmented heap...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found