question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Redis replies immediately but Lettuce experiences random timeout failures

See original GitHub issue

Bug Report

We have an application running on production which use lettuce to interact with redis, del operation mostly. Recently we found command timeout occurred randomly after startup 1-2 hours, we captured packets while client command timeout and found redis replay immediate(<1ms), but client get timeout. image

  • packet 431536: DEL request
  • packet 431557: response 1
  • packet 431559: ack for last packet

Current Behavior

command: del, args: [[key**********************************]], spend: 2000ms

Input Code

Input Code
// initialize client
ClientResources clientResources = DefaultClientResources.builder()
                .eventLoopGroupProvider(eventLoopGroupProvider) // reuse cross multi client
                .eventExecutorGroup(eventExecutorGroup) // same reused
                .build();
ClusterTopologyRefreshOptions refreshOptions = ClusterTopologyRefreshOptions.builder()
                .enableAdaptiveRefreshTrigger(
                        ClusterTopologyRefreshOptions.RefreshTrigger.MOVED_REDIRECT,
                        ClusterTopologyRefreshOptions.RefreshTrigger.ASK_REDIRECT)
                .build();
RedisClusterClient client = RedisClusterClient.create(resources, uri);
                client.setOptions(ClusterClientOptions.builder()
                        .socketOptions(buildSocketOptions()) // TCP_NODELAY, SO_KEEPALIVE
                        .timeoutOptions(buildTimeoutOptions()) // timeout 2s
                        .topologyRefreshOptions(refreshOptions)
                        .requestQueueSize(2000)
                        .build();
// call del operation
connection.sync().del(key);

Expected behavior/code

Environment

  • Lettuce version(s): [5.1.2.RELEASE, 5.2.2.RELEASE]
  • Netty versions: [4.1.29.Final, 4.1.48.Final]
  • Redis version: [4.0.14]

Possible Solution

Additional context

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:13 (6 by maintainers)

github_iconTop GitHub Comments

8reactions
Shawyeokcommented, Apr 19, 2020

Note that custom codecs may lead to such a situation when encoding/decoding fails. Please make sure that this isn’t the case so that you log encoding exceptions from within your codec.

Problem resolved, you are right in the first place. @mp911de

We enabled TRACE logging and found io.netty.handler.codec.EncoderException in the log. We found a code issue that will cause a ConcurrentModificationException in the command encode process, therefor cause a codec encode error. We cannot catch this error normally because just call setAsync and didn’t wait for result at all. 😦

3reactions
mp911decommented, Apr 3, 2020

Note that custom codecs may lead to such a situation when encoding/decoding fails. Please make sure that this isn’t the case so that you log encoding exceptions from within your codec.

Read more comments on GitHub >

github_iconTop Results From Across the Web

lettuce-io/Lobby - Gitter
we are using lettuce sync API in a web application(spring framework), during load test we found around 1% of the request to redis...
Read more >
Socket timeout in redis after migrating elastic redis cluster from ...
My application immediately started facing socket timeout error after I migrated my elastic cache cluster on aws from cache.t2.micro to ...
Read more >
Redis Client Lettuce Command Timeout Versus Socket Timeout
Lettuce.io lookup results from whois.rrpproxy.net server: Domain created: Redis replies immediately but Lettuce experiences random. Nov 03 2017 Iodine.
Read more >
Redis Lettuce integration with Java Spring Boot | BytePitch
The issue was a common one: we needed to do a vast number of calls to this provider and it was not scaling....
Read more >
Redis Load Handling vs Data Integrity - Hazelcast
After much investigation, we learned that as workload grows, at some point Redis almost immediately stops replication and continues to skip ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found