Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

RedisCommandTimeoutException on AWS ElastiCache Cluster

See original GitHub issue

I have a randomic timeout in java application that using Spring Redis Data 2.2.4 (Lettuce 5.2.1). I use Redis as Cache layer of an RESTFul Api Server and sometime i have a timeout. On Redis Side i have enabled Redis Slow Log but all query are under 10 milliseconds. The AWS ElastiCache Cluster is in composed by 3 shards and 2 replica with Cluster-Mode enabled (total 9 nodes m5.large). On application side, there is a Spring Task that periodically SCAN element in cache and require for some element TTL and IDLETIME because i have implemented a RefreshAhead algorithm to refresh cache value in background.

I tried to increase the number of threads ioThreadPoolSize and computationThreadPoolSize to 16 instead of 3. Timeouts have decreased but are still present.

This is the code of LettuceClientConfigurationBuilderCustomizer:


    @Value("${spring.redis.custom.cluster.enableAdaptiveRefresh:true}")
    private boolean enableAdaptiveRefresh;

    @Value("${spring.redis.custom.cluster.enableDynamicRefreshSources:true}")
    private boolean enableDynamicRefreshSources;

    @Value("${spring.redis.custom.cluster.enableSuspendReconnectOnProtocolFailure:false}")
    private boolean enableSuspendReconnectOnProtocolFailure;

    @Value("${spring.redis.custom.cluster.enableCancelCommandsOnReconnectFailure:true}")
    private boolean enableCancelCommandsOnReconnectFailure;

    @Value("${spring.redis.custom.cluster.ioThreadPoolSize:16}")
    private int ioThreadPoolSize;

    @Value("${spring.redis.custom.cluster.computationThreadPoolSize:16}")
    private int computationThreadPoolSize;

    public LettuceClientConfigurationBuilderCustomizer customizer() {

        Builder clusterTopologyRefreshOptionsBuilder = ClusterTopologyRefreshOptions.builder();

        if (enableAdaptiveRefresh) {
            clusterTopologyRefreshOptionsBuilder.enableAllAdaptiveRefreshTriggers();
        }
        clusterTopologyRefreshOptionsBuilder.dynamicRefreshSources(enableDynamicRefreshSources);
        ClusterTopologyRefreshOptions clusterTopologyRefreshOptions = clusterTopologyRefreshOptionsBuilder.build();
        ClusterClientOptions clusterClientOptions = ClusterClientOptions.builder()
                .suspendReconnectOnProtocolFailure(enableSuspendReconnectOnProtocolFailure)
                .cancelCommandsOnReconnectFailure(enableCancelCommandsOnReconnectFailure)
                .topologyRefreshOptions(clusterTopologyRefreshOptions).build();
        DefaultClientResources.Builder defaultClientResourcesBuilder = DefaultClientResources.builder()
                .ioThreadPoolSize(ioThreadPoolSize).computationThreadPoolSize(computationThreadPoolSize)
                .dnsResolver(new DirContextDnsResolver());

        ClientResources clientResources = defaultClientResourcesBuilder.build();
        return p -> p.clientOptions(clusterClientOptions).clientResources(clientResources)
                .readFrom(ReadFrom.REPLICA_PREFERRED);
    }

With ThreadDump i see that there are 16 thread of lettuce-epollEventLoop-- in RUNNABLE status and 3 thread of lettuce-eventExecutorLoop-- in TIME_WAITING but i am not sure that i caught the right time.

Current Behavior

This is an example of stack-trace:

Stack trace

org.springframework.dao.QueryTimeoutException: Redis command timed out; nested exception is io.lettuce.core.RedisCommandTimeoutException: Command timed out after 1 second(s)
	at org.springframework.data.redis.connection.lettuce.LettuceExceptionConverter.convert(LettuceExceptionConverter.java:70)
	at org.springframework.data.redis.connection.lettuce.LettuceExceptionConverter.convert(LettuceExceptionConverter.java:41)
	at org.springframework.data.redis.PassThroughExceptionTranslationStrategy.translate(PassThroughExceptionTranslationStrategy.java:44)
	at org.springframework.data.redis.FallbackExceptionTranslationStrategy.translate(FallbackExceptionTranslationStrategy.java:42)
	at org.springframework.data.redis.connection.lettuce.LettuceConnection.convertLettuceAccessException(LettuceConnection.java:270)
	at org.springframework.data.redis.connection.lettuce.LettuceKeyCommands.convertLettuceAccessException(LettuceKeyCommands.java:809)
	at org.springframework.data.redis.connection.lettuce.LettuceKeyCommands.ttl(LettuceKeyCommands.java:541)
	at org.springframework.data.redis.connection.DefaultedRedisConnection.ttl(DefaultedRedisConnection.java:209)
	at com.application.cache.refresh.ahead.redis.service.RedisKeyRetriever.lambda$scan$1(RedisKeyRetriever.java:68)
	at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:174)
	at java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948)
	at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
	at java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:290)
	at java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:731)
	at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
	at java.util.concurrent.ForkJoinPool.helpComplete(ForkJoinPool.java:1870)
	at java.util.concurrent.ForkJoinPool.externalHelpComplete(ForkJoinPool.java:2467)
	at java.util.concurrent.ForkJoinTask.externalAwaitDone(ForkJoinTask.java:324)
	at java.util.concurrent.ForkJoinTask.doInvoke(ForkJoinTask.java:405)
	at java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:734)
	at java.util.stream.ForEachOps$ForEachOp.evaluateParallel(ForEachOps.java:159)
	at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateParallel(ForEachOps.java:173)
	at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233)
	at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:485)
	at com.application.cache.refresh.ahead.service.RefreshAheadService.reloadAheadCachValuesForStream(RefreshAheadService.java:66)
	at com.application.cache.refresh.ahead.service.RefreshAheadService.access$200(RefreshAheadService.java:18)
	at com.application.cache.refresh.ahead.service.RefreshAheadService$1.run(RefreshAheadService.java:55)
	at net.javacrumbs.shedlock.core.DefaultLockingTaskExecutor.executeWithLock(DefaultLockingTaskExecutor.java:64)
	at net.javacrumbs.shedlock.core.DefaultLockingTaskExecutor.executeWithLock(DefaultLockingTaskExecutor.java:43)
	at com.application.cache.refresh.ahead.service.RefreshAheadService.reloadAheadValuesOfCache(RefreshAheadService.java:51)
	at com.application.cache.refresh.ahead.task.SelectiveCacheRefreshAheadScheduler.lambda$null$0(SelectiveCacheRefreshAheadScheduler.java:40)
	at org.springframework.cloud.sleuth.instrument.async.TraceRunnable.run(TraceRunnable.java:67)
	at org.springframework.scheduling.support.DelegatingErrorHandlingRunnable.run(DelegatingErrorHandlingRunnable.java:54)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: io.lettuce.core.RedisCommandTimeoutException: Command timed out after 1 second(s)
	at io.lettuce.core.ExceptionFactory.createTimeoutException(ExceptionFactory.java:51)
	at io.lettuce.core.LettuceFutures.awaitOrCancel(LettuceFutures.java:114)
	at io.lettuce.core.cluster.ClusterFutureSyncInvocationHandler.handleInvocation(ClusterFutureSyncInvocationHandler.java:123)
	at io.lettuce.core.internal.AbstractInvocationHandler.invoke(AbstractInvocationHandler.java:80)
	at com.sun.proxy.$Proxy201.ttl(Unknown Source)
	at org.springframework.data.redis.connection.lettuce.LettuceKeyCommands.ttl(LettuceKeyCommands.java:539)
	... 33 common frames omitted

Environment

Lettuce version(s): [e.g. 5.2.1.RELEASE]
Redis version: ElastiCache Cluster Mode Enabled - Redis 5.0.5

Issue Analytics

State:
Created 3 years ago
Comments:6 (4 by maintainers)

Top GitHub Comments

1reaction

mp911decommented, Jun 24, 2020

Closing due to lack of requested feedback. If you would like us to look at this issue, please provide the requested information and we will re-open the issue.

0reactions

mp911decommented, Jun 9, 2020

Threads look fine meaning that none of lettuce-epollEventLoop is blocked. However, the dump lists over 600 threads which might have an effect on performance.

Note that a second (taken from Command timed out after 1 second(s)) might be a simple consequence of a GC run. You might want to check also for GC pauses and align your timeouts to that.

Top Results From Across the Web

Troubleshooting - Amazon ElastiCache for Redis

Go to https://console.aws.amazon.com/ec2/v2/home?#NIC: Filter the interface list by your Elasticache cluster name or the IP address got ...

Troubleshoot connecting to an ElastiCache for Redis cluster

If you recently created the cluster, verify that the cluster creation completed and that the cluster is ready to accept connections.

Amazon ElastiCache error messages - AWS Documentation

Error Message: Cluster node quota exceeded. Each cluster can have at most %n nodes in this region. Cause: You attempted to create or...

Troubleshoot READONLY error after failover of ElastiCache ...

Short description. If the primary node failed over to the replica nodes in your Amazon ElastiCache cluster, then the replica takes the role...

Restricted Redis Commands - Amazon ElastiCache for Redis

To deliver a managed service experience, restricts access to certain cache engine-specific commands that require advanced privileges. For cache clusters ...