Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Fix ORC Bloom Filter

See original GitHub issue

ORC Bloom Filter Support has been broken in latest presto release.Either we should fix this in corresponding previous branches too OR we should mark this in release note that ORC Bloom Filter Support while Querying ORC table having Bloom Filter will not take advantage of Bloom Filter. The support is broken from Presto Release 0.214. After changes of StreamId in readBloomFilterIndexes method of StripeReader class the Bloom filter does not skip unsatisfied Row Group of ORC due to coding bug as the below line return always null.

StripeReader.java List<HiveBloomFilter> bloomFilters = bloomFilterIndexes.get(entry.getKey()); @kevinwilfong @dain Please have a look, This have an impact on Presto ORC performance.

Issue Analytics

State:
Created 4 years ago
Comments:8 (8 by maintainers)

Top GitHub Comments

1reaction

kevinwilfongcommented, Jun 4, 2019

Echoing Maria’s comment on the PR and Wenlei’s comment here, could you add a test that demonstrates the problem. It’s not immediately obvious to me why that line would always return null. I’m also concerned this fix wouldn’t work correctly for flat maps.

1reaction

wenleixcommented, Jun 4, 2019

@dilipkasana

After changes of StreamId in readBloomFilterIndexes method of StripeReader class the Bloom filter does not skip unsatisfied Row Group of ORC due to coding bug as the below line return always null.

I am curious why is that? Since StreamId just contains column, sequence (always 0 for ORC) and streamKind (should be the same for the same column) right ?

It might be something incorrect with StreamId that makes it not working in HashMap, although I didn’t see anything obviously wrong with its hashCode and equals method.

Ignoring sequence would cause bloom filter not work correctly for DWRF flat map.