Method `advanceIfNeeded` returns a count of skipped items
See original GitHub issueIs your feature request related to a problem? Please describe.
My use-case is that I have a very large array of objects which contain an ID in their property and I need to reduce the array to a smaller one with elements which id match some requested set. I may do a binary search and use a “comparator” on id property in the original array, but each comparison will require “hop” to a memory location where the object resides to retrieve the id property (which is rather slow).
That’s why I created a RoaringBitmap with the ids that are extracted from the records in the large array and search ids in it using org.roaringbitmap.RoaringBatchIterator
. The problem is that I need to “materialize” all records in RoaringBitmap via. nextBatch
in order to count all examined elements of the RoaringBitmap. I’d like to use advanceIfNeeded
approach to skip entire segments if the currently looked up id is not within them. The problem is that the advanceIfNeeded
method doesn’t return the count of elements, that has been skipped which is neccessary for me to compute the correct index in the original “large array of fat objects”.
Describe the solution you’d like
Adding a new method or changing the return type of advanceIfNeeded
to int
(the latter would probably break backward binary compatibility of RoaringBitmaps) that would represent a number of elements that has been skipped during “fast forward” of the iterator.
Does it make sense to you?
Code snippet reflecting the situation
@Nonnull
default PriceRecordContract[] getPriceRecords(@Nonnull Bitmap priceIds) {
final RoaringBatchIterator lookupIterator = RoaringBitmapBackedBitmap.getRoaringBitmap(priceIds).getBatchIterator();
final RoaringBatchIterator superSetIterator = RoaringBitmapBackedBitmap.getRoaringBitmap(getIndexedPriceIds()).getBatchIterator();
final PriceRecordContract[] priceRecords = getPriceRecords();
final CompositeObjectArray<PriceRecordContract> filteredPriceRecords = new CompositeObjectArray<>(PriceRecordContract.class);
int lastPriceId = 0;
int superSetBatchPeek = 0;
int lastSuperSetBatchRecord = 0;
int relativeSuperSetIndex = -1;
int[] lookupBatch = new int[512];
int[] superSetBatch = new int[521];
int superSetIteratorPeek = 0;
while (lookupIterator.hasNext()) {
final int lookupIteratorPeek = lookupIterator.nextBatch(lookupBatch);
for (int i = 0; i < lookupIteratorPeek; i++) {
int lookedUpId = lookupBatch[i];
// fast-forward if the last super set batch record is lesser than looked up id
while (lookedUpId > lastSuperSetBatchRecord && superSetIterator.hasNext()) {
superSetBatchPeek += superSetIteratorPeek;
superSetIteratorPeek = superSetIterator.nextBatch(superSetBatch);
relativeSuperSetIndex = -1;
if (superSetIteratorPeek > 0) {
lastSuperSetBatchRecord = superSetBatch[superSetIteratorPeek - 1];
} else {
lastSuperSetBatchRecord = 0;
}
}
// look for the index of price detail (searched block is getting smaller and smaller with each match)
final int fromIndex = relativeSuperSetIndex + 1;
final int toIndex = Math.min(fromIndex + lookedUpId - lastPriceId, superSetBatchPeek);
// look for the price in currently read super set batch
relativeSuperSetIndex = Arrays.binarySearch(superSetBatch, fromIndex, toIndex, lookedUpId);
if (relativeSuperSetIndex < 0) {
relativeSuperSetIndex = -1 * (relativeSuperSetIndex) - 2;
} else {
final PriceRecordContract priceRecord = priceRecords[superSetBatchPeek + relativeSuperSetIndex];
filteredPriceRecords.add(priceRecord);
lastPriceId = lookedUpId;
}
}
}
// return found prices as array
return filteredPriceRecords.toArray();
}
Issue Analytics
- State:
- Created a year ago
- Comments:9 (6 by maintainers)
Yes. That part is easy. But you don’t typically skip over an entire containers. So think about a bitset… it is just an array of bits…
010001101110000001110000111000001
. In general, the function you request requires you to count the bits in a range of indexes. We can do that fast, but it definitively is not free.Ah. Interesting.