question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

redis storage question?

See original GitHub issue

set1 = set(['minhash', 'is', 'a', 'probabilistic', 'data', 'structure', 'for', 'estimating', 'the', 'similarity', 'between', 'datasets'])

m1 = MinHash(num_perm=128) for d in set1: m1.update(d.encode('utf8'))

lsh.insert("m1", m1) lsh = MinHashLSH(threshold=0.9, num_perm=128, storage_config={ 'type': 'redis', 'redis': {'host': 'localhost', 'port': 6379,'db': 1},'name':1})

lsh1 = MinHashLSH(threshold=0.9, num_perm=128, storage_config={ 'type': 'redis', 'redis': {'host': 'localhost', 'port': 6379,'db': 1},'name':1})

lsh.keys.size() is 1 but lsh1.keys.size() is 0 Why?

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:5 (1 by maintainers)

github_iconTop GitHub Comments

1reaction
olijouvecommented, Nov 21, 2017

@qxde01 as ekzhu pointed you should install and use the pickle module to do the persistance.

In your minhash calculation script, import pickle, instanciate your minhash lsh with redis support as you actually do. Then load your data and when finished serialise your lsh object and save/dump it to a file.

In your querying script, import pickle, do not instanciate minhash lsh normally but use pickle instead to load previous file/serialized object in your lsh variable or whatever name. You can now use this instance normally as it will share the same and complete redis storage.

1reaction
olijouvecommented, Nov 2, 2017

It is because a basename/prefix for keys storing is generated randomly at run time so each time you run your script you will have a new set of keys.

You need to set your own isolation prefix to get an efficient redis storage over multiple runs. Here is a small fix that worked for me

In datasketch/lsh.py replace the line 114: basename = _random_name(11) with that code: if storage_config.has_key(‘redis_prefix’): basename = storage_config[‘redis_prefix’] else: basename = _random_name(11)

EDIT: take care of the missing indentation

Then in your own code, simply add the new property to the storage_config dictionary, like that:

lsh = MinHashLSH( threshold=0.79, num_perm=128, storage_config={ ‘type’: ‘redis’, ‘redis_prefix’: ‘mylsh’, ‘redis’: {‘host’: ‘localhost’, ‘port’: 6379} })

That way you can choose what isolation prefix you want. It’ll allow you to store different minhash lsh context in the same redis database.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Top 10 Redis Interview Questions and Answers (2022)
1) What is Redis? 2) Explain the Replication feature of Redis? 3) What is the difference between Memcached and Redis? 4) What are...
Read more >
Top 50 Interview Question and Answers of Redis
Answer: Redis is a high-performance database based on memory — mainly relies on memory. 5. What is the maximum storage capacity for a...
Read more >
25 Redis Interview Questions (ANSWERED) For Web ...
Q1: What is Redis?
Read more >
Redis Interview Questions (2023) - Javatpoint
1) What is Redis? ... Redis is an open-source, advance key value data store and cache. It is also referred as a data...
Read more >
Top 20+ Redis Interview Questions 2022 - MindMajix
Redis Interview Questions · 1. List the main operation keys of Redis. · 2. How can the durability of Redis be enhanced? ·...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found