Intermittent "Connection is closed" errors
See original GitHub issueWe are currently working on a Lambda function, which connects to a Redis 3.2.10 cluster on AWS Elasticache.
This Lambda function will connect to the Redis cluster, run KEYS
on each master node, collect the responses from each node and return an array of keys. We then publish an SNS message for each key in this array, then close the cluster connection, before the Lambda ends.
AWS Lambda freezes and thaws the container in which programs run. So, ideally we would create a connection once then re-use it on every invocation. However, we have found that for the Lambda to end, we must explicitly end the client connection to the cluster as Lambda waits for the Node event loop to empty before the Lambda ends. This is why we create the connection at the start of the function (representing a Lambda invocation) run our queries and then when this completes we attempt to gracefully .quit()
the Redis.Cluster
connection.
I can’t share the actual code that we’re working on, but I’ve been able to extract the logic and create a simple example of the issue we’re facing:
test.js
const Redis = require("ioredis");
let interval = setInterval(() => {
let conn = new Redis.Cluster(["cluster.id.clustercfg.euw1.cache.amazonaws.com"]);
Promise
.all(conn.nodes("master").map((node) => {
return node.keys("*:*");
}))
.then((resp) => {
console.log("Complete KEYS on all nodes", JSON.stringify(resp));
return conn.quit()
})
.then(() => {
console.log("Gracefully closed connection");
})
.catch((e) => {
console.log("Caught rejection: ", e.message);
})
}, 500);
setTimeout(() => {
clearInterval(interval);
}, 3000);
Example output:
ioredis:cluster status: [empty] -> connecting +0ms
ioredis:redis status[cluster.id.clustercfg.euw1.cache.amazonaws.com:6379]: [empty] -> wait +5ms
ioredis:cluster getting slot cache from cluster.id.clustercfg.euw1.cache.amazonaws.com:6379 +1ms
ioredis:redis status[cluster.id.clustercfg.euw1.cache.amazonaws.com:6379]: wait -> connecting +2ms
ioredis:redis queue command[0] -> cluster(slots) +1ms
ioredis:redis queue command[0] -> keys(*:*) +1ms
ioredis:redis status[10.1.0.45:6379]: connecting -> connect +21ms
ioredis:redis write command[0] -> info() +0ms
ioredis:redis status[10.1.0.45:6379]: connect -> ready +5ms
ioredis:connection send 2 commands in offline queue +1ms
ioredis:redis write command[0] -> cluster(slots) +0ms
ioredis:redis write command[0] -> keys(*:*) +0ms
ioredis:redis status[10.1.1.131:6379]: [empty] -> wait +3ms
ioredis:redis status[10.1.2.152:6379]: [empty] -> wait +1ms
ioredis:redis status[10.1.0.45:6379]: [empty] -> wait +0ms
ioredis:cluster status: connecting -> connect +0ms
ioredis:redis queue command[0] -> cluster(info) +1ms
Complete KEYS on all nodes [["132f28d0-8322-43d6-bbbd-200a19c130c0:tf0NuoVBZIXDIryxBRj3lrcayXeHwaoD"]]
ioredis:cluster status: connect -> disconnecting +2ms
ioredis:redis queue command[0] -> quit() +0ms
ioredis:redis status[10.1.1.131:6379]: wait -> connecting +0ms
ioredis:redis status[10.1.2.152:6379]: wait -> connecting +0ms
ioredis:redis status[10.1.0.45:6379]: wait -> connecting +0ms
ioredis:redis status[10.1.1.131:6379]: connecting -> end +2ms
ioredis:redis status[10.1.2.152:6379]: connecting -> end +0ms
ioredis:redis status[10.1.0.45:6379]: connecting -> end +0ms
ioredis:redis status[10.1.0.45:6379]: ready -> close +1ms
ioredis:connection skip reconnecting since the connection is manually closed. +1ms
ioredis:redis status[10.1.0.45:6379]: close -> end +0ms
ioredis:cluster status: disconnecting -> close +2ms
ioredis:cluster status: close -> end +0ms
Caught rejection: Connection is closed.
ioredis:delayqueue send 1 commands in failover queue +100ms
ioredis:cluster status: end -> disconnecting +2ms
// SNIP
Why would we be getting the Connection is closed
rejection error? This feels like a bug, as I think we are going about this in the correct way, but I’m happy to be proved wrong!
Issue Analytics
- State:
- Created 6 years ago
- Reactions:9
- Comments:30 (3 by maintainers)
Top GitHub Comments
I’m commenting here to confirm that this issue is still cropping up for us.
I’m not really sure why the bot above adds a “wontfix” label to an issue that hasn’t had any recent activity 🤔
I’ve also been able to reproduce this problem but only in AWS.
I believe the problem is related to the offline queue. The error originates when the
close()
method is called from the event_handler. The error eventually bubbles up in the redis class whenflushQueue()
is executed with a non-empty offline queue.The commandQueue also occasionally causes this problem but it’s much less frequent.