Hscan in Redis with 1,000,000 records is taking longer than usual
See original GitHub issueI am trying to scan a key space having about a million keys by using pattern scan. I am using Jedis 2.8.1 and spring-data-redis 1.7.2 . The amount of time it takes to scan the entire key set thousand times is about 5 minutes. The same thing takes about 10s when done on MySQL. I think I might be doing something wrong since Redis is supposed to be faster than SQL-based databases.
Here are the steps for execution:
- Set up the spring configuration as mentioned below.
- This is the method I am using for scanning the keys(Hscan):
@Override
public List<Object> findByPattern(String pattern) {
List<Object> listOfObjects = new ArrayList<>();
Cursor<Entry<Object, Object>> cursorMap = redisTemplate.boundHashOps(SNAPSHOT_MODULE_KEY)
.scan(ScanOptions.scanOptions().match(pattern).count(100).build());
if (cursorMap != null) {
while (cursorMap.hasNext()) {
listOfObjects.add(cursorMap.next().getKey());
}
return listOfObjects;
} else {
return null;
}
}
- Here is the code by in which I am calling the scan method:
List<Object> listOfSnapshotModulesKeys = snapshotRepo.findByPattern(snapshotPattern);
NOTE: The key for each row is a combination of four separate strings. The pattern that I use for searching(the variable snapshotPattern) is a combination of three strings. For example, KEY: ‘Str1:Str2:Str3:Str4’ PATTERN: ‘*:Srt5:Str6:Str7’
Also, I am not using connection pooling or pipelining. I have turned off pooling since I was facing an error otherwise. This error has been mentioned in issue: https://github.com/xetorthio/jedis/issues/918#issuecomment-229922190
Redis / Jedis Configuration
Jedis version: 2.8.1 Redis version: 3.0.501 Java version: 1.7.0_79 Spring-data-redis version: 1.7.2.RELEASE
@marcosnils, @deepakpol, @mp911de: What can I do to make the search faster?
Issue Analytics
- State:
- Created 7 years ago
- Comments:26 (14 by maintainers)
@chinmaym7430 There you go
Bash code to generate a hash with a million entries ( takes just under a minute to complete , thanks to bash streaming capabilites :
Bash code to scan entire hash for a 3 letter suffix pattern ( you used suffix pattern in your description :
COUNT=100 : 19 seconds , 3729 iterations COUNT=1000 : 3 seconds , 375 iterations COUNT=10000 : 1 seconds , 38 iterations COUNT=100000 : 1 seconds , 4 iterations
@chinmaym7430 Just to make sure and without delving into the details , you are aware that SCAN is not efficient as SQL RDBMS index scan by design , right ?
Match patterns are applied against each scanned result output after they’ve been emitted . This means that first the database emits everything in its current iteration and then applies the match pattern on top of it
Do all of the iterations of you method take the same time to complete ? Maybe it’s a Garbage collection tweaking issue?
Perhaps You can write a simple for loop in bash to test directly against the redis locally ( via redis-cli , I believe the latency of re-running redis-cli and opening the connection should not have a great effect on the results with COUNT size of 10000 and above ) and see how much time does each iteration takes , regardless of the java integration.I believe it’ll be a good starting point to see differences