Unable to create multiple lsh indices each one in its own keyspace
See original GitHub issueFirst of all, thank you for great work @ekzhu! Here is a reproducible test that shows that my expectation is to create 1 keyspace per each LSH index unfortunately all LSH tables are being created in the scope of the same Cassandra keyspace.
def create_lsh_index(index):
print("(Create index: {}) - ".format(index), end='')
threshold = 0.75
num_perm = 128
doc1 = MinHash(num_perm)
lsh = MinHashLSH(
threshold=threshold, num_perm=num_perm, storage_config={
'type': 'cassandra',
'basename': index.encode('ascii'),
'cassandra': {
'seeds': cassandra_seeds,
'keyspace': index,
'replication': {
'class': 'SimpleStrategy',
'replication_factor': '3',
},
'drop_keyspace': True,
'drop_tables': True,
}
}
)
lsh.insert("a", doc1)
lsh.insert("b", doc1)
counts = lsh.get_counts()
# second instance
assert len(counts) == 11
def test_cassandra_multi_index():
create_lsh_index('idx1')
create_lsh_index('idx2')
create_lsh_index('idx3')
The produced result inside of Cassandra DB, please see below:
cqlsh> DESCRIBE keyspaces;
system_schema system_traces **idx1** system_distributed_everywhere
system_auth same_index system system_distributed
cqlsh> use idx1;
cqlsh:idx1> desc tables;
lsh_idx3_keys lsh_idx2_bucket_0003 lsh_idx3_bucket_0006
lsh_idx2_bucket_000a lsh_idx2_bucket_0002 lsh_idx3_bucket_0007
lsh_idx1_keys lsh_idx2_bucket_0009 lsh_idx1_bucket_0008
lsh_idx2_keys lsh_idx2_bucket_0008 lsh_idx1_bucket_0009
lsh_idx1_bucket_000a lsh_idx3_bucket_0008 lsh_idx1_bucket_0006
lsh_idx3_bucket_000a lsh_idx3_bucket_0009 lsh_idx1_bucket_0007
lsh_idx2_bucket_0005 lsh_idx3_bucket_0000 lsh_idx1_bucket_0004
lsh_idx2_bucket_0004 lsh_idx3_bucket_0001 lsh_idx1_bucket_0005
lsh_idx2_bucket_0007 lsh_idx3_bucket_0002 lsh_idx1_bucket_0002
lsh_idx2_bucket_0006 lsh_idx3_bucket_0003 lsh_idx1_bucket_0003
lsh_idx2_bucket_0001 lsh_idx3_bucket_0004 lsh_idx1_bucket_0000
lsh_idx2_bucket_0000 lsh_idx3_bucket_0005 lsh_idx1_bucket_0001
Looking forward to hear from you what we are doing wrong since we don’t have any practical experience with datasketch yet.
Issue Analytics
- State:
- Created 2 years ago
- Comments:6 (4 by maintainers)
Top Results From Across the Web
Releases · ekzhu/datasketch - GitHub
MinHash, LSH, LSH Forest, Weighted MinHash, HyperLogLog, ... Unable to create multiple lsh indices each one in its own keyspace - issue #171...
Read more >MinHash LSH — datasketch 1.0.0 documentation
To create index for a large number of MinHashes using asynchronous MinHash LSH. To bulk remove keys from LSH index using asynchronous MinHash...
Read more >reformer: the efficient transformer - arXiv
To implement masking in LSH attention, we associate every query/key vector with a position index, re-order the position indices using the same ......
Read more >Locality Sensitive Hashing (LSH): The Illustrated Guide
The magic, theory, and practice of Locality Sensitive Hashing. ... All we do is create an empty vector full of zeros and the...
Read more >Reformer Reproducibility – Weights & Biases - Wandb
The next step is to add LSH clustering to our new attention mechanism. We refer to figure 2 of the reformer paper that...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
hi @ekzhu, add a pull request that fixes this issue by creating\switching to different keyspace when needed. https://github.com/ekzhu/datasketch/pull/172
Merged an released.