question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Question] Zookeeper failed to verify host address after upgraded to 0.18.0

See original GitHub issue

We had been running a Kafka cluster in an base metal K8s with following details:

  • 3 zookeeper: lab-zookeeper-0/1/2
  • 3 brokers: lab-kafka-0/1/2
  • cluster operator version: 0.17.0
  • K8s namespace: kafka-lab
  • strimzi cluster name: lab

After we upgraded cluster operator to 0.18.0, zookeepers and kafka got automatically rolled updated to pick up new image (strimzi/kafka:0.18.0-kafka-2.4.0), and everything looked normal (at least from kubectl get po). However, when I get logs from one of zookeeper pod, we have seen a lot of Failed to verify hostname errors detailed as following:

2020-05-27 00:07:57,861 ERROR Failed to verify host address: 10.244.180.244 (org.apache.zookeeper.common.ZKTrustManager) [lab-zookeeper-1.lab-zookeeper-nodes.kafka-lab.svc/10.244.69.60:3888] javax.net.ssl.SSLPeerUnverifiedException: Certificate for <10.244.180.244> doesn’t match any of the subject alternative names: [lab-zookeeper-2.lab-zookeeper-nodes.kafka-lab.svc.cluster.local, lab-zookeeper-2.lab-zookeeper-nodes.kafka-lab.svc, lab-zookeeper-client.kafka-lab.svc, lab-zookeeper-client.kafka-lab.svc.cluster.local, lab-zookeeper-client, lab-zookeeper-client.kafka-lab] at org.apache.zookeeper.common.ZKHostnameVerifier.matchIPAddress(ZKHostnameVerifier.java:194) at org.apache.zookeeper.common.ZKHostnameVerifier.verify(ZKHostnameVerifier.java:164) at org.apache.zookeeper.common.ZKTrustManager.performHostVerification(ZKTrustManager.java:135) at org.apache.zookeeper.common.ZKTrustManager.checkClientTrusted(ZKTrustManager.java:74) at sun.security.ssl.ServerHandshaker.clientCertificate(ServerHandshaker.java:2037) at sun.security.ssl.ServerHandshaker.processMessage(ServerHandshaker.java:233) at sun.security.ssl.Handshaker.processLoop(Handshaker.java:1082) at sun.security.ssl.Handshaker.process_record(Handshaker.java:1010) at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:1079) at sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1388) at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1416) at sun.security.ssl.SSLSocketImpl.getSession(SSLSocketImpl.java:2309) at org.apache.zookeeper.server.quorum.UnifiedServerSocket$UnifiedSocket.detectMode(UnifiedServerSocket.java:273) at org.apache.zookeeper.server.quorum.UnifiedServerSocket$UnifiedSocket.getSocket(UnifiedServerSocket.java:301) at org.apache.zookeeper.server.quorum.UnifiedServerSocket$UnifiedSocket.access$400(UnifiedServerSocket.java:180) at org.apache.zookeeper.server.quorum.UnifiedServerSocket$UnifiedInputStream.getRealInputStream(UnifiedServerSocket.java:700) at org.apache.zookeeper.server.quorum.UnifiedServerSocket$UnifiedInputStream.read(UnifiedServerSocket.java:694) at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) at java.io.BufferedInputStream.read1(BufferedInputStream.java:286) at java.io.BufferedInputStream.read(BufferedInputStream.java:345) at java.io.DataInputStream.readFully(DataInputStream.java:195) at java.io.DataInputStream.readLong(DataInputStream.java:416) at org.apache.zookeeper.server.quorum.QuorumCnxManager.handleConnection(QuorumCnxManager.java:524) at org.apache.zookeeper.server.quorum.QuorumCnxManager.receiveConnection(QuorumCnxManager.java:478) at org.apache.zookeeper.server.quorum.QuorumCnxManager$Listener.run(QuorumCnxManager.java:934) 2020-05-27 00:07:57,861 ERROR Failed to verify hostname: 10-244-180-244.lab-zookeeper-client.kafka-lab.svc.cluster.local (org.apache.zookeeper.common.ZKTrustManager) [lab-zookeeper-1.lab-zookeeper-nodes.kafka-lab.svc/10.244.69.60:3888] javax.net.ssl.SSLPeerUnverifiedException: Certificate for <10-244-180-244.lab-zookeeper-client.kafka-lab.svc.cluster.local> doesn’t match any of the subject alternative names: [lab-zookeeper-2.lab-zookeeper-nodes.kafka-lab.svc.cluster.local, lab-zookeeper-2.lab-zookeeper-nodes.kafka-lab.svc, lab-zookeeper-client.kafka-lab.svc, lab-zookeeper-client.kafka-lab.svc.cluster.local, lab-zookeeper-client, lab-zookeeper-client.kafka-lab] at org.apache.zookeeper.common.ZKHostnameVerifier.matchDNSName(ZKHostnameVerifier.java:224) at org.apache.zookeeper.common.ZKHostnameVerifier.verify(ZKHostnameVerifier.java:170) at org.apache.zookeeper.common.ZKTrustManager.performHostVerification(ZKTrustManager.java:141) at org.apache.zookeeper.common.ZKTrustManager.checkClientTrusted(ZKTrustManager.java:74) at sun.security.ssl.ServerHandshaker.clientCertificate(ServerHandshaker.java:2037) at sun.security.ssl.ServerHandshaker.processMessage(ServerHandshaker.java:233) at sun.security.ssl.Handshaker.processLoop(Handshaker.java:1082) at sun.security.ssl.Handshaker.process_record(Handshaker.java:1010) at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:1079) at sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1388) at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1416) at sun.security.ssl.SSLSocketImpl.getSession(SSLSocketImpl.java:2309) at org.apache.zookeeper.server.quorum.UnifiedServerSocket$UnifiedSocket.detectMode(UnifiedServerSocket.java:273) at org.apache.zookeeper.server.quorum.UnifiedServerSocket$UnifiedSocket.getSocket(UnifiedServerSocket.java:301) at org.apache.zookeeper.server.quorum.UnifiedServerSocket$UnifiedSocket.access$400(UnifiedServerSocket.java:180) at org.apache.zookeeper.server.quorum.UnifiedServerSocket$UnifiedInputStream.getRealInputStream(UnifiedServerSocket.java:700) at org.apache.zookeeper.server.quorum.UnifiedServerSocket$UnifiedInputStream.read(UnifiedServerSocket.java:694) at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) at java.io.BufferedInputStream.read1(BufferedInputStream.java:286) at java.io.BufferedInputStream.read(BufferedInputStream.java:345) at java.io.DataInputStream.readFully(DataInputStream.java:195) at java.io.DataInputStream.readLong(DataInputStream.java:416) at org.apache.zookeeper.server.quorum.QuorumCnxManager.handleConnection(QuorumCnxManager.java:524) at org.apache.zookeeper.server.quorum.QuorumCnxManager.receiveConnection(QuorumCnxManager.java:478) at org.apache.zookeeper.server.quorum.QuorumCnxManager$Listener.run(QuorumCnxManager.java:934) 2020-05-27 00:07:57,861 INFO Accepted TLS connection from 10-244-180-244.lab-zookeeper-client.kafka-lab.svc.cluster.local/10.244.180.244:54086 - NONE - SSL_NULL_WITH_NULL_NULL (org.apache.zookeeper.server.quorum.UnifiedServerSocket) [lab-zookeeper-1.lab-zookeeper-nodes.kafka-lab.svc/10.244.69.60:3888] 2020-05-27 00:07:57,861 WARN Exception reading or writing challenge: {} (org.apache.zookeeper.server.quorum.QuorumCnxManager) [lab-zookeeper-1.lab-zookeeper-nodes.kafka-lab.svc/10.244.69.60:3888] javax.net.ssl.SSLException: Connection has been shutdown: javax.net.ssl.SSLHandshakeException: java.security.cert.CertificateException: Failed to verify both host address and host name at sun.security.ssl.SSLSocketImpl.checkEOF(SSLSocketImpl.java:1554) at sun.security.ssl.AppInputStream.read(AppInputStream.java:95) at org.apache.zookeeper.server.quorum.UnifiedServerSocket$UnifiedInputStream.read(UnifiedServerSocket.java:694) at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) at java.io.BufferedInputStream.read1(BufferedInputStream.java:286) at java.io.BufferedInputStream.read(BufferedInputStream.java:345) at java.io.DataInputStream.readFully(DataInputStream.java:195) at java.io.DataInputStream.readLong(DataInputStream.java:416) at org.apache.zookeeper.server.quorum.QuorumCnxManager.handleConnection(QuorumCnxManager.java:524) at org.apache.zookeeper.server.quorum.QuorumCnxManager.receiveConnection(QuorumCnxManager.java:478) at org.apache.zookeeper.server.quorum.QuorumCnxManager$Listener.run(QuorumCnxManager.java:934) Caused by: javax.net.ssl.SSLHandshakeException: java.security.cert.CertificateException: Failed to verify both host address and host name at sun.security.ssl.Alerts.getSSLException(Alerts.java:198) at sun.security.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1967) at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:331) at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:325) at sun.security.ssl.ServerHandshaker.clientCertificate(ServerHandshaker.java:2055) at sun.security.ssl.ServerHandshaker.processMessage(ServerHandshaker.java:233) at sun.security.ssl.Handshaker.processLoop(Handshaker.java:1082) at sun.security.ssl.Handshaker.process_record(Handshaker.java:1010) at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:1079) at sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1388) at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1416) at sun.security.ssl.SSLSocketImpl.getSession(SSLSocketImpl.java:2309) at org.apache.zookeeper.server.quorum.UnifiedServerSocket$UnifiedSocket.detectMode(UnifiedServerSocket.java:273) at org.apache.zookeeper.server.quorum.UnifiedServerSocket$UnifiedSocket.getSocket(UnifiedServerSocket.java:301) at org.apache.zookeeper.server.quorum.UnifiedServerSocket$UnifiedSocket.access$400(UnifiedServerSocket.java:180) at org.apache.zookeeper.server.quorum.UnifiedServerSocket$UnifiedInputStream.getRealInputStream(UnifiedServerSocket.java:700) … 9 more Caused by: java.security.cert.CertificateException: Failed to verify both host address and host name at org.apache.zookeeper.common.ZKTrustManager.performHostVerification(ZKTrustManager.java:145) at org.apache.zookeeper.common.ZKTrustManager.checkClientTrusted(ZKTrustManager.java:74) at sun.security.ssl.ServerHandshaker.clientCertificate(ServerHandshaker.java:2037) … 20 more Caused by: javax.net.ssl.SSLPeerUnverifiedException: Certificate for <10-244-180-244.lab-zookeeper-client.kafka-lab.svc.cluster.local> doesn’t match any of the subject alternative names: [lab-zookeeper-2.lab-zookeeper-nodes.kafka-lab.svc.cluster.local, lab-zookeeper-2.lab-zookeeper-nodes.kafka-lab.svc, lab-zookeeper-client.kafka-lab.svc, lab-zookeeper-client.kafka-lab.svc.cluster.local, lab-zookeeper-client, lab-zookeeper-client.kafka-lab] at org.apache.zookeeper.common.ZKHostnameVerifier.matchDNSName(ZKHostnameVerifier.java:224) at org.apache.zookeeper.common.ZKHostnameVerifier.verify(ZKHostnameVerifier.java:170) at org.apache.zookeeper.common.ZKTrustManager.performHostVerification(ZKTrustManager.java:141) … 22 more 2020-05-27 00:07:58,366 INFO Authenticated Id ‘CN=lab-kafka,O=io.strimzi’ for Scheme ‘x509’ (org.apache.zookeeper.server.auth.X509AuthenticationProvider) [nioEventLoopGroup-7-2] 2020-05-27 00:07:58,367 WARN Closing connection to /10.244.65.25:41740 (org.apache.zookeeper.server.NettyServerCnxn) [nioEventLoopGroup-7-2] java.io.IOException: ZK down at org.apache.zookeeper.server.NettyServerCnxn.receiveMessage(NettyServerCnxn.java:474) at org.apache.zookeeper.server.NettyServerCnxn.processMessage(NettyServerCnxn.java:360) at org.apache.zookeeper.server.NettyServerCnxnFactory$CnxnChannelHandler.channelRead(NettyServerCnxnFactory.java:266) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:377) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:355) at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1470) at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1219) at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1266) at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:498) at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:437) at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:276) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:377) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:355) at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:377) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363) at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:714) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:650) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:576) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493) at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.lang.Thread.run(Thread.java:748) 2020-05-27 00:07:58,853 INFO Authenticated Id ‘CN=cluster-operator,O=io.strimzi’ for Scheme ‘x509’ (org.apache.zookeeper.server.auth.X509AuthenticationProvider) [nioEventLoopGroup-7-3] 2020-05-27 00:07:58,853 WARN Closing connection to /10.244.69.241:53394 (org.apache.zookeeper.server.NettyServerCnxn) [nioEventLoopGroup-7-3] java.io.IOException: ZK down at org.apache.zookeeper.server.NettyServerCnxn.receiveMessage(NettyServerCnxn.java:474) at org.apache.zookeeper.server.NettyServerCnxn.processMessage(NettyServerCnxn.java:360) at org.apache.zookeeper.server.NettyServerCnxnFactory$CnxnChannelHandler.channelRead(NettyServerCnxnFactory.java:266) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:377) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:355) at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1470) at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1219) at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1266) at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:498) at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:437) at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:276) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:377) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:355) at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:377) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363) at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:714) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:650) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:576) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493) at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.lang.Thread.run(Thread.java:748)

What I got most confused is the log seems: 1 verifying against the host address 10.244.180.244 (the K8s internal ip) of a zookeeper peer pod, which failed because the cert doesn’t cover that ip 2. then trying to verify hostname: 10-244-180-244.lab-zookeeper-client.kafka-lab.svc.cluster.local, which is essentially combining the pod ip and client service.

I guess that’s related to the migration from tls sidecar for zookeepr to the built-in tls support. Would really appreciate any help.

And here is the zookeeper section of Kafka manifest:

  zookeeper:
    replicas: 3
    resources:
      requests:
        memory: 6Gi
        cpu: "2"
      limits:
        memory: 6Gi
        cpu: "2"
    jvmOptions:
      -Xms: 3072m
      -Xmx: 3072m
    storage:
      type: persistent-claim
      size: 20Gi
      class: ssd
      deleteClaim: false

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:32 (14 by maintainers)

github_iconTop GitHub Comments

1reaction
scholzjcommented, May 29, 2020

@Escaflow I’m very confused now. Are you working with @oulydna and talking about the same issue? Or do you have your own spearate issue which you think is related? The issue here in the previous logs is Received fatal alert: certificate_unknown - so that is IMHO something what happens before the hostname verification.

If you have your own cluster with the hostname verification issue, can you share the logs? You can use this script to collect them into. ZIP archive: https://github.com/strimzi/strimzi-kafka-operator/blob/master/tools/report.sh

0reactions
oulydnacommented, Jun 3, 2020

Really appreciate all the help @scholzj . Cheers!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Strimzi Kafka Zookeeper not starting - Stack Overflow
i'm trying to deploy kafka using strimzi, but zookeeper keep throwing following exception. Failed to verify hostname: 10.244.0.14 ...
Read more >
ZooKeeper Installation and Configuration
From the localhost, connect to ZooKeeper with the following command to verify access (replace the IP address with your Zookeeper server):.
Read more >
Deploying and Upgrading (0.18.0) - Strimzi
Deploy the Kafka cluster with the ZooKeeper cluster, and include the Topic ... by the name of the resource and cannot be changed...
Read more >
ZooKeeper Administrator's Guide - Apache ZooKeeper
Use "stat" command on the command port to see if they are in good health. After you have verified that all the other...
Read more >
Dataproc release notes - Google Cloud
Upgrade Cloud Storage connector version to 2.2.9 in Serverless Spark runtime ... Fixed bug affecting cluster scale-down: If Dataproc was unable to verify...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found