redis storage question?
See original GitHub issueset1 = set(['minhash', 'is', 'a', 'probabilistic', 'data', 'structure', 'for', 'estimating', 'the', 'similarity', 'between', 'datasets'])
m1 = MinHash(num_perm=128)
for d in set1: m1.update(d.encode('utf8'))
lsh.insert("m1", m1)
lsh = MinHashLSH(threshold=0.9, num_perm=128, storage_config={ 'type': 'redis', 'redis': {'host': 'localhost', 'port': 6379,'db': 1},'name':1})
lsh1 = MinHashLSH(threshold=0.9, num_perm=128, storage_config={ 'type': 'redis', 'redis': {'host': 'localhost', 'port': 6379,'db': 1},'name':1})
lsh.keys.size()
is 1 but lsh1.keys.size()
is 0
Why?
Issue Analytics
- State:
- Created 6 years ago
- Comments:5 (1 by maintainers)
Top Results From Across the Web
Top 10 Redis Interview Questions and Answers (2022)
1) What is Redis? 2) Explain the Replication feature of Redis? 3) What is the difference between Memcached and Redis? 4) What are...
Read more >Top 50 Interview Question and Answers of Redis
Answer: Redis is a high-performance database based on memory — mainly relies on memory. 5. What is the maximum storage capacity for a...
Read more >Redis Interview Questions (2023) - Javatpoint
1) What is Redis? ... Redis is an open-source, advance key value data store and cache. It is also referred as a data...
Read more >Top 20+ Redis Interview Questions 2022 - MindMajix
Redis Interview Questions · 1. List the main operation keys of Redis. · 2. How can the durability of Redis be enhanced? ·...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@qxde01 as ekzhu pointed you should install and use the pickle module to do the persistance.
In your minhash calculation script, import pickle, instanciate your minhash lsh with redis support as you actually do. Then load your data and when finished serialise your lsh object and save/dump it to a file.
In your querying script, import pickle, do not instanciate minhash lsh normally but use pickle instead to load previous file/serialized object in your lsh variable or whatever name. You can now use this instance normally as it will share the same and complete redis storage.
It is because a basename/prefix for keys storing is generated randomly at run time so each time you run your script you will have a new set of keys.
You need to set your own isolation prefix to get an efficient redis storage over multiple runs. Here is a small fix that worked for me
In datasketch/lsh.py replace the line 114: basename = _random_name(11) with that code: if storage_config.has_key(‘redis_prefix’): basename = storage_config[‘redis_prefix’] else: basename = _random_name(11)
EDIT: take care of the missing indentation
Then in your own code, simply add the new property to the storage_config dictionary, like that:
lsh = MinHashLSH( threshold=0.79, num_perm=128, storage_config={ ‘type’: ‘redis’, ‘redis_prefix’: ‘mylsh’, ‘redis’: {‘host’: ‘localhost’, ‘port’: 6379} })
That way you can choose what isolation prefix you want. It’ll allow you to store different minhash lsh context in the same redis database.