Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Question about implementation of double hashing

See original GitHub issue

Hi. I’m just looking into the hashing implementation here and I’m baffled by several aspects of the implementation, hoping you can tell me if there’s anything I’m missing.

For example, in getIndices, you’re rehashing the input on every iteration of the loop: https://github.com/Callidon/bloom-filters/blob/5c81fa4054465f446e3bb1606ddeceffdb907d81/src/utils.ts#L206-L209

Surely this defeats the point of double hashing, to simulate k independent hash functions given only two real ones? Not using double hashing at all would need to do one hash per index, your implementation does two hashes per index.

It’s true that the hashes you’re calculating on each loop aren’t quite the same, because you’re adding size % i – that’s the number of cells modulo the loop iteration – to the seed each time. But why? That seems like a really strange thing to add to the seed. It doesn’t guarantee that the seed is different on each loop (eg if the number of cells is even it’ll be 0 for the first 2 iterations). But again that’s not something you should want/need anyway in double hashing.

Am I missing something?

Issue Analytics

State:
Created 2 years ago
Reactions:1
Comments:7

Top GitHub Comments

1reaction

SimonWoolfcommented, Dec 3, 2021

Do you accept a merge of your work here? (see last commit, authorship added)

Sure, go for it

0reactions

folkvircommented, Dec 2, 2021

Ahah 👍 we worked exactly on the same thing but in different ways, and I’m actually surprised by yours. I agree, this will almost never happen because the number of hash functions is not very high in practice. I mean, I never see someone setting a hashCount of 1000. But just in case it will work. Do you accept a merge of your work here? (see last commit, authorship added)