Question about implementation of double hashing
See original GitHub issueHi. Iām just looking into the hashing implementation here and Iām baffled by several aspects of the implementation, hoping you can tell me if thereās anything Iām missing.
For example, in getIndices, youāre rehashing the input on every iteration of the loop: https://github.com/Callidon/bloom-filters/blob/5c81fa4054465f446e3bb1606ddeceffdb907d81/src/utils.ts#L206-L209
Surely this defeats the point of double hashing, to simulate k independent hash functions given only two real ones? Not using double hashing at all would need to do one hash per index, your implementation does two hashes per index.
Itās true that the hashes youāre calculating on each loop arenāt quite the same, because youāre adding size % i
ā thatās the number of cells modulo the loop iteration ā to the seed each time. But why? That seems like a really strange thing to add to the seed. It doesnāt guarantee that the seed is different on each loop (eg if the number of cells is even itāll be 0 for the first 2 iterations). But again thatās not something you should want/need anyway in double hashing.
Am I missing something?
Issue Analytics
- State:
- Created 2 years ago
- Reactions:1
- Comments:7
Top GitHub Comments
Sure, go for it
Ahah š we worked exactly on the same thing but in different ways, and Iām actually surprised by yours. I agree, this will almost never happen because the number of hash functions is not very high in practice. I mean, I never see someone setting a hashCount of 1000. But just in case it will work. Do you accept a merge of your work here? (see last commit, authorship added)