AWS Elasticache + TLS + Hostname Verification
See original GitHub issueTested Using
- Node.js v10.13.0
- ioredis v4.2.3
- AWS Clustered Redis Elasticache (3 nodes, in-transit and at-rest encryption enabled)
Problem
When connecting to a Redis cluster, a list of nodes (host
and port
combinations) are given.
In the cluster connection logic, the list of node hostnames are then resolved to IP addresses.
For the nodes that have successfully had hostnames resolved to IP addresses, the logic then
overwrites the given host
for that node with the resolved IP address. Normally this would not be
an issue, but when connecting to a Redis cluster via TLS (such as AWS encrypted Elasticache) this
can cause issues with Node.js’ built-in TLS hostname verification. Basically, if the IP address of
the node does not appear in the certificate received from the server, then the hostname verification
step will fail causing the connection to fail.
Example
Running the following code:
const Redis = require('ioredis');
const nodes = [{
host: 'clustercfg.xxx.use1.cache.amazonaws.com',
port: '6379',
}];
const options = {
redisOptions: {
tls: {}
}
}
const cluster = new Redis.Cluster(nodes, options);
cluster.set('test-key', 'test-value');
cluster.get('test-key', function (err, res) {
console.log(res);
if (err) {
console.error(err)
}
cluster.disconnect()
});
ran with DEBUG=ioredis:* node index.js
produces the following logs:
ioredis:cluster status: [empty] -> connecting +0ms
ioredis:cluster resolved hostname clustercfg.xxx.use1.cache.amazonaws.com to IP aaa.bbb.ccc.ddd +47ms
ioredis:cluster:connectionPool Reset with [ { host: 'aaa.bbb.ccc.ddd', port: 6379 } ] +0ms
ioredis:cluster:connectionPool Connecting to aaa.bbb.ccc.ddd:6379 as master +3ms
ioredis:redis status[aaa.bbb.ccc.ddd:6379]: [empty] -> wait +0ms
ioredis:cluster getting slot cache from aaa.bbb.ccc.ddd:6379 +7ms
ioredis:redis status[aaa.bbb.ccc.ddd:6379]: wait -> connecting +1ms
ioredis:redis queue command[aaa.bbb.ccc.ddd:6379]: 0 -> cluster([ 'slots' ]) +1ms
ioredis:cluster:subscriber selected a subscriber aaa.bbb.ccc.ddd:6379 +0ms
ioredis:redis status[aaa.bbb.ccc.ddd:6379 (ioredisClusterSubscriber)]: [empty] -> wait +0ms
ioredis:cluster:subscriber started +1ms
ioredis:connection error: Error [ERR_TLS_CERT_ALTNAME_INVALID]: Hostname/IP does not match certificate's altnames: IP: aaa.bbb.ccc.ddd is not in the cert's list: +0ms
ioredis:redis status[aaa.bbb.ccc.ddd:6379]: connecting -> close +111ms
ioredis:connection skip reconnecting because `retryStrategy` is not a function +2ms
ioredis:redis status[aaa.bbb.ccc.ddd:6379]: close -> end +1ms
ioredis:cluster:subscriber subscriber has left, selecting a new one... +111ms
ioredis:redis status[aaa.bbb.ccc.ddd:6379 (ioredisClusterSubscriber)]: wait -> close +1ms
ioredis:connection skip reconnecting since the connection is manually closed. +1ms
ioredis:redis status[aaa.bbb.ccc.ddd:6379 (ioredisClusterSubscriber)]: close -> end +0ms
ioredis:cluster:subscriber selecting subscriber failed since there is no node discovered in the cluster yet +1ms
ioredis:cluster status: connecting -> close +114ms
ioredis:cluster status: close -> reconnecting +0ms
[ioredis] Unhandled error event: ClusterAllFailedError: Failed to refresh slots cache.
at tryNode (/Users/jdavis/workspace/experiments/encrypted-ec-ioredis/node_modules/ioredis/built/cluster/index.js:319:31)
at /Users/jdavis/workspace/experiments/encrypted-ec-ioredis/node_modules/ioredis/built/cluster/index.js:335:21
at redis.cluster.utils_1.timeout (/Users/jdavis/workspace/experiments/encrypted-ec-ioredis/node_modules/ioredis/built/cluster/index.js:551:24)
at run (/Users/jdavis/workspace/experiments/encrypted-ec-ioredis/node_modules/ioredis/built/utils/index.js:150:22)
at tryCatcher (/Users/jdavis/workspace/experiments/encrypted-ec-ioredis/node_modules/standard-as-callback/lib/utils.js:10:19)
at /Users/jdavis/workspace/experiments/encrypted-ec-ioredis/node_modules/standard-as-callback/index.js:31:35
at process._tickCallback (internal/process/next_tick.js:68:7)
ioredis:cluster:connectionPool Reset with [] +118ms
ioredis:cluster connecting failed: Error: None of startup nodes is available +2ms
ioredis:cluster Cluster is disconnected. Retrying after 102ms +105ms
ioredis:cluster status: reconnecting -> connecting +0ms
ioredis:cluster resolved hostname clustercfg.xxx.use1.cache.amazonaws.com to IP aaa.bbb.ccc.ddd +2ms
ioredis:cluster:connectionPool Reset with [ { host: 'aaa.bbb.ccc.ddd', port: 6379 } ] +108ms
ioredis:cluster:connectionPool Connecting to aaa.bbb.ccc.ddd:6379 as master +0ms
ioredis:redis status[aaa.bbb.ccc.ddd:6379]: [empty] -> wait +110ms
ioredis:cluster:subscriber a new node is discovered and there is no subscriber, selecting a new one... +110ms
ioredis:cluster:subscriber selected a subscriber aaa.bbb.ccc.ddd:6379 +1ms
ioredis:redis status[aaa.bbb.ccc.ddd:6379 (ioredisClusterSubscriber)]: [empty] -> wait +1ms
ioredis:cluster getting slot cache from aaa.bbb.ccc.ddd:6379 +2ms
ioredis:redis status[aaa.bbb.ccc.ddd:6379]: wait -> connecting +0ms
ioredis:redis queue command[aaa.bbb.ccc.ddd:6379]: 0 -> cluster([ 'slots' ]) +0ms
ioredis:redis status[aaa.bbb.ccc.ddd:6379 (ioredisClusterSubscriber)]: wait -> close +0ms
ioredis:connection skip reconnecting since the connection is manually closed. +112ms
ioredis:redis status[aaa.bbb.ccc.ddd:6379 (ioredisClusterSubscriber)]: close -> end +1ms
ioredis:cluster:subscriber selected a subscriber aaa.bbb.ccc.ddd:6379 +1ms
ioredis:redis status[aaa.bbb.ccc.ddd:6379 (ioredisClusterSubscriber)]: [empty] -> wait +0ms
ioredis:cluster:subscriber started +0ms
ioredis:connection error: Error [ERR_TLS_CERT_ALTNAME_INVALID]: Hostname/IP does not match certificate's altnames: IP: aaa.bbb.ccc.ddd is not in the cert's list: +99ms
ioredis:redis status[aaa.bbb.ccc.ddd:6379]: connecting -> close +99ms
ioredis:connection skip reconnecting because `retryStrategy` is not a function +0ms
ioredis:redis status[aaa.bbb.ccc.ddd:6379]: close -> end +0ms
ioredis:cluster:subscriber subscriber has left, selecting a new one... +99ms
ioredis:redis status[aaa.bbb.ccc.ddd:6379 (ioredisClusterSubscriber)]: wait -> close +0ms
ioredis:connection skip reconnecting since the connection is manually closed. +0ms
ioredis:redis status[aaa.bbb.ccc.ddd:6379 (ioredisClusterSubscriber)]: close -> end +0ms
ioredis:cluster:subscriber selecting subscriber failed since there is no node discovered in the cluster yet +0ms
ioredis:cluster status: connecting -> close +100ms
ioredis:cluster status: close -> reconnecting +0ms
[ioredis] Unhandled error event: ClusterAllFailedError: Failed to refresh slots cache.
at tryNode (/Users/jdavis/workspace/experiments/encrypted-ec-ioredis/node_modules/ioredis/built/cluster/index.js:319:31)
at /Users/jdavis/workspace/experiments/encrypted-ec-ioredis/node_modules/ioredis/built/cluster/index.js:335:21
at redis.cluster.utils_1.timeout (/Users/jdavis/workspace/experiments/encrypted-ec-ioredis/node_modules/ioredis/built/cluster/index.js:551:24)
at run (/Users/jdavis/workspace/experiments/encrypted-ec-ioredis/node_modules/ioredis/built/utils/index.js:150:22)
at tryCatcher (/Users/jdavis/workspace/experiments/encrypted-ec-ioredis/node_modules/standard-as-callback/lib/utils.js:10:19)
at /Users/jdavis/workspace/experiments/encrypted-ec-ioredis/node_modules/standard-as-callback/index.js:31:35
at process._tickCallback (internal/process/next_tick.js:68:7)
ioredis:cluster:connectionPool Reset with [] +102ms
ioredis:cluster Got error Error: None of startup nodes is available when reconnecting. Ignoring... +1ms
ioredis:cluster Cluster is disconnected. Retrying after 104ms +107ms
ioredis:cluster status: reconnecting -> connecting +0ms
where clustercfg.xxx.use1.cache.amazonaws.com
is the cluster hostname and aaa.bbb.ccc.ddd
is the
IP address the hostname resolves to.
Note the following debug log message:
ioredis:connection error: Error [ERR_TLS_CERT_ALTNAME_INVALID]: Hostname/IP does not match certificate's altnames: IP: aaa.bbb.ccc.ddd is not in the cert's list: +0ms
Workaround
My current workaround is to bypass the TLS hostname verification step with a custom
checkServerIdentity
function:
const Redis = require('ioredis');
const nodes = [{
host: 'clustercfg.xxx.use1.cache.amazonaws.com',
port: '6379',
}];
const options = {
redisOptions: {
tls: {
checkServerIdentity: (servername, cert) => {
// skip certificate hostname validation
return undefined;
},
}
}
}
const cluster = new Redis.Cluster(nodes, options);
cluster.set('aws', 'test');
cluster.get('aws', function (err, res) {
console.log(res);
if (err) {
console.error(err)
}
cluster.disconnect()
});
Additional Notes
I also experimented with changing part of the cluster connection logic
(resolveStartupNodeHostnames
) so that it does not overwrite the host
with the resolved IP
address. I changed it to the following:
resolveStartupNodeHostnames() {
if (!Array.isArray(this.startupNodes) || this.startupNodes.length === 0) {
return Promise.reject(new Error('`startupNodes` should contain at least one node.'));
}
const startupNodes = util_1.normalizeNodeOptions(this.startupNodes);
const hostnames = util_1.getUniqueHostnamesFromOptions(startupNodes);
if (hostnames.length === 0) {
return Promise.resolve(startupNodes);
}
return Promise.all(hostnames.map((hostname) => this.dnsLookup(hostname))).then((ips) => {
// change made here
// rough implementation not overwriting the hosts
return startupNodes;
});
}
and I was able to connect to the cluster using the first code example I gave.
Questions
- Is there a specific reason that the hosts given when defining the Redis cluster configuration are replaced by their resolved IP addresses when connecting?
- Could the connection logic be changed to use hostnames instead of IP address to connect?
- If #2 is not possible could we add a configuration option to disable overwriting the configured hostnames with their IP addresses on connection?
Issue Analytics
- State:
- Created 5 years ago
- Comments:14 (4 by maintainers)
Top GitHub Comments
Under the existing framework, we firstly convert host to ipaddress, then pass ipaddress to create connection pool object.
When tls is enabled, ipaddress used in connection pool object will fail the tls connection set up since hostname is expected. The suggested solution in readme through bypassing dnslookup is hard to understand for most of users. How about we address the issue as below
I tested the above change and it works for both tls & non-tls scenarios.
Hi, I got here for trying to to understand the reasoning behind the instruction: https://github.com/luin/ioredis#special-note-aws-elasticache-clusters-with-tls which looks suspicious to me, as that doesn’t look like a valid dns lookup implementation, for returning something else than IP.
From reading this thread and the code, it seems like the issue is that https://github.com/luin/ioredis/blob/3bf300a1c99ae4cf8038930c45e19ebd68db222e/lib/cluster/index.ts#L1023-L1030 does an early DNS lookup, and uses the result to connect later in the code.
So, by the time it attempt to connect, it doesn’t have the service name to validate.
So short-circuiting the early dns lookup by returning the same name, will pass the actual hostname to NodeJS TLS instead of IP, so it knows what server names to validate against.
Is there a value on doing this early DNS lookup? If so, aren’t we loosing it this bypass?
Have you considered some alternatives, like saving the hostname to be used as
servername
in the TLS connection, or maybe passing thedns.lookup
option there instead of calling it?Thanks!