Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

AWS Elasticache + TLS + Hostname Verification

See original GitHub issue

Tested Using

Node.js v10.13.0
ioredis v4.2.3
AWS Clustered Redis Elasticache (3 nodes, in-transit and at-rest encryption enabled)

Problem

When connecting to a Redis cluster, a list of nodes (host and port combinations) are given. In the cluster connection logic, the list of node hostnames are then resolved to IP addresses. For the nodes that have successfully had hostnames resolved to IP addresses, the logic then overwrites the given host for that node with the resolved IP address. Normally this would not be an issue, but when connecting to a Redis cluster via TLS (such as AWS encrypted Elasticache) this can cause issues with Node.js’ built-in TLS hostname verification. Basically, if the IP address of the node does not appear in the certificate received from the server, then the hostname verification step will fail causing the connection to fail.

Example

Running the following code:

const Redis = require('ioredis');

const nodes = [{
    host: 'clustercfg.xxx.use1.cache.amazonaws.com',
    port: '6379',
}];

const options = {
    redisOptions: {
        tls: {}
    }
}

const cluster = new Redis.Cluster(nodes, options);

cluster.set('test-key', 'test-value');

cluster.get('test-key', function (err, res) {
    console.log(res);

    if (err) {
        console.error(err)
    }

    cluster.disconnect()
});

ran with DEBUG=ioredis:* node index.js produces the following logs:

ioredis:cluster status: [empty] -> connecting +0ms
  ioredis:cluster resolved hostname clustercfg.xxx.use1.cache.amazonaws.com to IP aaa.bbb.ccc.ddd +47ms
  ioredis:cluster:connectionPool Reset with [ { host: 'aaa.bbb.ccc.ddd', port: 6379 } ] +0ms
  ioredis:cluster:connectionPool Connecting to aaa.bbb.ccc.ddd:6379 as master +3ms
  ioredis:redis status[aaa.bbb.ccc.ddd:6379]: [empty] -> wait +0ms
  ioredis:cluster getting slot cache from aaa.bbb.ccc.ddd:6379 +7ms
  ioredis:redis status[aaa.bbb.ccc.ddd:6379]: wait -> connecting +1ms
  ioredis:redis queue command[aaa.bbb.ccc.ddd:6379]: 0 -> cluster([ 'slots' ]) +1ms
  ioredis:cluster:subscriber selected a subscriber aaa.bbb.ccc.ddd:6379 +0ms
  ioredis:redis status[aaa.bbb.ccc.ddd:6379 (ioredisClusterSubscriber)]: [empty] -> wait +0ms
  ioredis:cluster:subscriber started +1ms
  ioredis:connection error: Error [ERR_TLS_CERT_ALTNAME_INVALID]: Hostname/IP does not match certificate's altnames: IP: aaa.bbb.ccc.ddd is not in the cert's list:  +0ms
  ioredis:redis status[aaa.bbb.ccc.ddd:6379]: connecting -> close +111ms
  ioredis:connection skip reconnecting because `retryStrategy` is not a function +2ms
  ioredis:redis status[aaa.bbb.ccc.ddd:6379]: close -> end +1ms
  ioredis:cluster:subscriber subscriber has left, selecting a new one... +111ms
  ioredis:redis status[aaa.bbb.ccc.ddd:6379 (ioredisClusterSubscriber)]: wait -> close +1ms
  ioredis:connection skip reconnecting since the connection is manually closed. +1ms
  ioredis:redis status[aaa.bbb.ccc.ddd:6379 (ioredisClusterSubscriber)]: close -> end +0ms
  ioredis:cluster:subscriber selecting subscriber failed since there is no node discovered in the cluster yet +1ms
  ioredis:cluster status: connecting -> close +114ms
  ioredis:cluster status: close -> reconnecting +0ms
[ioredis] Unhandled error event: ClusterAllFailedError: Failed to refresh slots cache.
    at tryNode (/Users/jdavis/workspace/experiments/encrypted-ec-ioredis/node_modules/ioredis/built/cluster/index.js:319:31)
    at /Users/jdavis/workspace/experiments/encrypted-ec-ioredis/node_modules/ioredis/built/cluster/index.js:335:21
    at redis.cluster.utils_1.timeout (/Users/jdavis/workspace/experiments/encrypted-ec-ioredis/node_modules/ioredis/built/cluster/index.js:551:24)
    at run (/Users/jdavis/workspace/experiments/encrypted-ec-ioredis/node_modules/ioredis/built/utils/index.js:150:22)
    at tryCatcher (/Users/jdavis/workspace/experiments/encrypted-ec-ioredis/node_modules/standard-as-callback/lib/utils.js:10:19)
    at /Users/jdavis/workspace/experiments/encrypted-ec-ioredis/node_modules/standard-as-callback/index.js:31:35
    at process._tickCallback (internal/process/next_tick.js:68:7)
  ioredis:cluster:connectionPool Reset with [] +118ms
  ioredis:cluster connecting failed: Error: None of startup nodes is available +2ms
  ioredis:cluster Cluster is disconnected. Retrying after 102ms +105ms
  ioredis:cluster status: reconnecting -> connecting +0ms
  ioredis:cluster resolved hostname clustercfg.xxx.use1.cache.amazonaws.com to IP aaa.bbb.ccc.ddd +2ms
  ioredis:cluster:connectionPool Reset with [ { host: 'aaa.bbb.ccc.ddd', port: 6379 } ] +108ms
  ioredis:cluster:connectionPool Connecting to aaa.bbb.ccc.ddd:6379 as master +0ms
  ioredis:redis status[aaa.bbb.ccc.ddd:6379]: [empty] -> wait +110ms
  ioredis:cluster:subscriber a new node is discovered and there is no subscriber, selecting a new one... +110ms
  ioredis:cluster:subscriber selected a subscriber aaa.bbb.ccc.ddd:6379 +1ms
  ioredis:redis status[aaa.bbb.ccc.ddd:6379 (ioredisClusterSubscriber)]: [empty] -> wait +1ms
  ioredis:cluster getting slot cache from aaa.bbb.ccc.ddd:6379 +2ms
  ioredis:redis status[aaa.bbb.ccc.ddd:6379]: wait -> connecting +0ms
  ioredis:redis queue command[aaa.bbb.ccc.ddd:6379]: 0 -> cluster([ 'slots' ]) +0ms
  ioredis:redis status[aaa.bbb.ccc.ddd:6379 (ioredisClusterSubscriber)]: wait -> close +0ms
  ioredis:connection skip reconnecting since the connection is manually closed. +112ms
  ioredis:redis status[aaa.bbb.ccc.ddd:6379 (ioredisClusterSubscriber)]: close -> end +1ms
  ioredis:cluster:subscriber selected a subscriber aaa.bbb.ccc.ddd:6379 +1ms
  ioredis:redis status[aaa.bbb.ccc.ddd:6379 (ioredisClusterSubscriber)]: [empty] -> wait +0ms
  ioredis:cluster:subscriber started +0ms
  ioredis:connection error: Error [ERR_TLS_CERT_ALTNAME_INVALID]: Hostname/IP does not match certificate's altnames: IP: aaa.bbb.ccc.ddd is not in the cert's list:  +99ms
  ioredis:redis status[aaa.bbb.ccc.ddd:6379]: connecting -> close +99ms
  ioredis:connection skip reconnecting because `retryStrategy` is not a function +0ms
  ioredis:redis status[aaa.bbb.ccc.ddd:6379]: close -> end +0ms
  ioredis:cluster:subscriber subscriber has left, selecting a new one... +99ms
  ioredis:redis status[aaa.bbb.ccc.ddd:6379 (ioredisClusterSubscriber)]: wait -> close +0ms
  ioredis:connection skip reconnecting since the connection is manually closed. +0ms
  ioredis:redis status[aaa.bbb.ccc.ddd:6379 (ioredisClusterSubscriber)]: close -> end +0ms
  ioredis:cluster:subscriber selecting subscriber failed since there is no node discovered in the cluster yet +0ms
  ioredis:cluster status: connecting -> close +100ms
  ioredis:cluster status: close -> reconnecting +0ms
[ioredis] Unhandled error event: ClusterAllFailedError: Failed to refresh slots cache.
    at tryNode (/Users/jdavis/workspace/experiments/encrypted-ec-ioredis/node_modules/ioredis/built/cluster/index.js:319:31)
    at /Users/jdavis/workspace/experiments/encrypted-ec-ioredis/node_modules/ioredis/built/cluster/index.js:335:21
    at redis.cluster.utils_1.timeout (/Users/jdavis/workspace/experiments/encrypted-ec-ioredis/node_modules/ioredis/built/cluster/index.js:551:24)
    at run (/Users/jdavis/workspace/experiments/encrypted-ec-ioredis/node_modules/ioredis/built/utils/index.js:150:22)
    at tryCatcher (/Users/jdavis/workspace/experiments/encrypted-ec-ioredis/node_modules/standard-as-callback/lib/utils.js:10:19)
    at /Users/jdavis/workspace/experiments/encrypted-ec-ioredis/node_modules/standard-as-callback/index.js:31:35
    at process._tickCallback (internal/process/next_tick.js:68:7)
  ioredis:cluster:connectionPool Reset with [] +102ms
  ioredis:cluster Got error Error: None of startup nodes is available when reconnecting. Ignoring... +1ms
  ioredis:cluster Cluster is disconnected. Retrying after 104ms +107ms
  ioredis:cluster status: reconnecting -> connecting +0ms

where clustercfg.xxx.use1.cache.amazonaws.com is the cluster hostname and aaa.bbb.ccc.ddd is the IP address the hostname resolves to.

Note the following debug log message:

ioredis:connection error: Error [ERR_TLS_CERT_ALTNAME_INVALID]: Hostname/IP does not match certificate's altnames: IP: aaa.bbb.ccc.ddd is not in the cert's list:  +0ms

Workaround

My current workaround is to bypass the TLS hostname verification step with a custom checkServerIdentity function:

const Redis = require('ioredis');

const nodes = [{
    host: 'clustercfg.xxx.use1.cache.amazonaws.com',
    port: '6379',
}];

const options = {
    redisOptions: {
        tls: {
            checkServerIdentity: (servername, cert) => {
                // skip certificate hostname validation
                return undefined;
            },
        }
    }
}

const cluster = new Redis.Cluster(nodes, options);

cluster.set('aws', 'test');

cluster.get('aws', function (err, res) {
    console.log(res);

    if (err) {
        console.error(err)
    }

    cluster.disconnect()
});

Additional Notes

I also experimented with changing part of the cluster connection logic (resolveStartupNodeHostnames) so that it does not overwrite the host with the resolved IP address. I changed it to the following:

resolveStartupNodeHostnames() {
        if (!Array.isArray(this.startupNodes) || this.startupNodes.length === 0) {
            return Promise.reject(new Error('`startupNodes` should contain at least one node.'));
        }
        const startupNodes = util_1.normalizeNodeOptions(this.startupNodes);
        const hostnames = util_1.getUniqueHostnamesFromOptions(startupNodes);
        if (hostnames.length === 0) {
            return Promise.resolve(startupNodes);
        }
        return Promise.all(hostnames.map((hostname) => this.dnsLookup(hostname))).then((ips) => {
            // change made here
            // rough implementation not overwriting the hosts
            return startupNodes;
        });
    }

and I was able to connect to the cluster using the first code example I gave.

Questions

Is there a specific reason that the hosts given when defining the Redis cluster configuration are replaced by their resolved IP addresses when connecting?
Could the connection logic be changed to use hostnames instead of IP address to connect?
If #2 is not possible could we add a configuration option to disable overwriting the configured hostnames with their IP addresses on connection?

Issue Analytics

State:
Created 5 years ago
Comments:14 (4 by maintainers)

Top GitHub Comments

1reaction

jiyanhbocommented, Apr 28, 2022

Under the existing framework, we firstly convert host to ipaddress, then pass ipaddress to create connection pool object.

When tls is enabled, ipaddress used in connection pool object will fail the tls connection set up since hostname is expected. The suggested solution in readme through bypassing dnslookup is hard to understand for most of users. How about we address the issue as below

When tls is enabled, make sure host passed to cluster constructor is hostname instead of ipaddress
remove the logic to convert host to ipaddress & pass host to create connection pool object

I tested the above change and it works for both tls & non-tls scenarios.

0reactions

renatomariscalcommented, Apr 25, 2022

Hi, I got here for trying to to understand the reasoning behind the instruction: https://github.com/luin/ioredis#special-note-aws-elasticache-clusters-with-tls which looks suspicious to me, as that doesn’t look like a valid dns lookup implementation, for returning something else than IP.

From reading this thread and the code, it seems like the issue is that https://github.com/luin/ioredis/blob/3bf300a1c99ae4cf8038930c45e19ebd68db222e/lib/cluster/index.ts#L1023-L1030 does an early DNS lookup, and uses the result to connect later in the code.

So, by the time it attempt to connect, it doesn’t have the service name to validate.

So short-circuiting the early dns lookup by returning the same name, will pass the actual hostname to NodeJS TLS instead of IP, so it knows what server names to validate against.

Is there a value on doing this early DNS lookup? If so, aren’t we loosing it this bypass?

Have you considered some alternatives, like saving the hostname to be used as servername in the TLS connection, or maybe passing the dns.lookup option there instead of calling it?

Thanks!

Top Results From Across the Web

ElastiCache in-transit encryption (TLS) - AWS Documentation

Encrypt data in transit with Amazon ElastiCache. ... In-transit encryption supports TLS versions 1.2 and 1.3. In-transit encryption is supported only for ...

DNS names and underlying IP - Amazon ElastiCache

ElastiCache ensures that both the DNS name and the IP address of the cache node remain the same when cache nodes are recovered...

ElastiCache in-transit encryption (TLS) - AWS Documentation

Amazon ElastiCache in-transit encryption is an optional feature that allows you to increase the security of your data at its most vulnerable points—when...

Redis-specific parameters - Amazon ElastiCache for Redis

When the value is set to tls-dynamic, the node will advertise a hostname when encryption-in-transit is enabled and an ip address otherwise. latency-tracking....

Step 4: Connect to the cluster's node - Amazon ElastiCache ...

You can use the option --tls with redis-cli to connect to both cluster mode enabled and disabled encrypted clusters. If a cluster has...