Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Hscan in Redis with 1,000,000 records is taking longer than usual

See original GitHub issue

I am trying to scan a key space having about a million keys by using pattern scan. I am using Jedis 2.8.1 and spring-data-redis 1.7.2 . The amount of time it takes to scan the entire key set thousand times is about 5 minutes. The same thing takes about 10s when done on MySQL. I think I might be doing something wrong since Redis is supposed to be faster than SQL-based databases.

Here are the steps for execution:

Set up the spring configuration as mentioned below.
This is the method I am using for scanning the keys(Hscan):

    @Override
    public List<Object> findByPattern(String pattern) {
        List<Object> listOfObjects = new ArrayList<>();
        Cursor<Entry<Object, Object>> cursorMap = redisTemplate.boundHashOps(SNAPSHOT_MODULE_KEY)
                .scan(ScanOptions.scanOptions().match(pattern).count(100).build());
        if (cursorMap != null) {
            while (cursorMap.hasNext()) {
                listOfObjects.add(cursorMap.next().getKey());
            }
            return listOfObjects;
        } else {
            return null;
        }
    }

Here is the code by in which I am calling the scan method:

List<Object> listOfSnapshotModulesKeys = snapshotRepo.findByPattern(snapshotPattern);

NOTE: The key for each row is a combination of four separate strings. The pattern that I use for searching(the variable snapshotPattern) is a combination of three strings. For example, KEY: ‘Str1:Str2:Str3:Str4’ PATTERN: ‘*:Srt5:Str6:Str7’

Also, I am not using connection pooling or pipelining. I have turned off pooling since I was facing an error otherwise. This error has been mentioned in issue: https://github.com/xetorthio/jedis/issues/918#issuecomment-229922190

Redis / Jedis Configuration

Jedis version: 2.8.1 Redis version: 3.0.501 Java version: 1.7.0_79 Spring-data-redis version: 1.7.2.RELEASE

@marcosnils, @deepakpol, @mp911de: What can I do to make the search faster?

Issue Analytics

State:
Created 7 years ago
Comments:26 (14 by maintainers)

Top GitHub Comments

12reactions

sheinbergoncommented, Jul 6, 2016

@chinmaym7430 There you go

Bash code to generate a hash with a million entries ( takes just under a minute to complete , thanks to bash streaming capabilites :

cat /dev/urandom | tr -dc 'a-zA-Z0-9' | fold -w 32  | head -n 1000000  | while read UNIQUE ; do echo " HMSET htest '$UNIQUE' '$UNIQUE' "  ; done  | redis-cli -h localhost -p 6379 --pipe

Bash code to scan entire hash for a 3 letter suffix pattern ( you used suffix pattern in your description :

COUNT=100 ;ITERATIONS=0 ; CURSOR=5 ; START=`date +%s` ; while [[ $CURSOR -ne 0 ]]  ; do SCAN_RESULT=(`redis-cli hscan htest $CURSOR MATCH *Rdm COUNT $COUNT`) ; CURSOR=${SCAN_RESULT[0]} ; ITERATIONS=$((ITERATIONS+1)) ; done ; END=`date +%s` ; echo $((END-START)) " seconds , $ITERATIONS iterations"

COUNT=100 : 19 seconds , 3729 iterations COUNT=1000 : 3 seconds , 375 iterations COUNT=10000 : 1 seconds , 38 iterations COUNT=100000 : 1 seconds , 4 iterations

3reactions

sheinbergoncommented, Jul 6, 2016

@chinmaym7430 Just to make sure and without delving into the details , you are aware that SCAN is not efficient as SQL RDBMS index scan by design , right ?

Match patterns are applied against each scanned result output after they’ve been emitted . This means that first the database emits everything in its current iteration and then applies the match pattern on top of it

Do all of the iterations of you method take the same time to complete ? Maybe it’s a Garbage collection tweaking issue?

Perhaps You can write a simple for loop in bash to test directly against the redis locally ( via redis-cli , I believe the latency of re-running redis-cli and opening the connection should not have a great effect on the results with COUNT size of 10000 and above ) and see how much time does each iteration takes , regardless of the java integration.I believe it’ll be a good starting point to see differences

Top Results From Across the Web

Is there any recommended value of COUNT for SCAN ...

For how long are you willing to block your Redis server by running a SCAN command. The higher the COUNT , the longer...

The Effects of Redis SCAN on Performance and How KeyDB ...

This article looks at the limitations of using the SCAN command and the effects it has on performance.

HSCAN - Redis

Incrementally iterate hash fields and associated values. ... for a complete iteration, including enough command calls for the cursor to return back to...

Redis Namespace and Other Keys to Developing with Redis

Whatever the case, chances are that some of the data in your Redis database are no longer used and taking up space for...

Troubleshooting - Amazon ElastiCache for Redis

Nodes with more than one vCPU usually have different values for CPUUtilization and ... HSCAN must be preferred over HKEYS to avoid long...