[Question] Zookeeper failed to verify host address after upgraded to 0.18.0
See original GitHub issueWe had been running a Kafka cluster in an base metal K8s with following details:
- 3 zookeeper: lab-zookeeper-0/1/2
- 3 brokers: lab-kafka-0/1/2
- cluster operator version: 0.17.0
- K8s namespace: kafka-lab
- strimzi cluster name: lab
After we upgraded cluster operator to 0.18.0, zookeepers and kafka got automatically rolled updated to pick up new image (strimzi/kafka:0.18.0-kafka-2.4.0), and everything looked normal (at least from kubectl get po). However, when I get logs from one of zookeeper pod, we have seen a lot of Failed to verify hostname errors detailed as following:
2020-05-27 00:07:57,861 ERROR Failed to verify host address: 10.244.180.244 (org.apache.zookeeper.common.ZKTrustManager) [lab-zookeeper-1.lab-zookeeper-nodes.kafka-lab.svc/10.244.69.60:3888] javax.net.ssl.SSLPeerUnverifiedException: Certificate for <10.244.180.244> doesn’t match any of the subject alternative names: [lab-zookeeper-2.lab-zookeeper-nodes.kafka-lab.svc.cluster.local, lab-zookeeper-2.lab-zookeeper-nodes.kafka-lab.svc, lab-zookeeper-client.kafka-lab.svc, lab-zookeeper-client.kafka-lab.svc.cluster.local, lab-zookeeper-client, lab-zookeeper-client.kafka-lab] at org.apache.zookeeper.common.ZKHostnameVerifier.matchIPAddress(ZKHostnameVerifier.java:194) at org.apache.zookeeper.common.ZKHostnameVerifier.verify(ZKHostnameVerifier.java:164) at org.apache.zookeeper.common.ZKTrustManager.performHostVerification(ZKTrustManager.java:135) at org.apache.zookeeper.common.ZKTrustManager.checkClientTrusted(ZKTrustManager.java:74) at sun.security.ssl.ServerHandshaker.clientCertificate(ServerHandshaker.java:2037) at sun.security.ssl.ServerHandshaker.processMessage(ServerHandshaker.java:233) at sun.security.ssl.Handshaker.processLoop(Handshaker.java:1082) at sun.security.ssl.Handshaker.process_record(Handshaker.java:1010) at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:1079) at sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1388) at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1416) at sun.security.ssl.SSLSocketImpl.getSession(SSLSocketImpl.java:2309) at org.apache.zookeeper.server.quorum.UnifiedServerSocket$UnifiedSocket.detectMode(UnifiedServerSocket.java:273) at org.apache.zookeeper.server.quorum.UnifiedServerSocket$UnifiedSocket.getSocket(UnifiedServerSocket.java:301) at org.apache.zookeeper.server.quorum.UnifiedServerSocket$UnifiedSocket.access$400(UnifiedServerSocket.java:180) at org.apache.zookeeper.server.quorum.UnifiedServerSocket$UnifiedInputStream.getRealInputStream(UnifiedServerSocket.java:700) at org.apache.zookeeper.server.quorum.UnifiedServerSocket$UnifiedInputStream.read(UnifiedServerSocket.java:694) at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) at java.io.BufferedInputStream.read1(BufferedInputStream.java:286) at java.io.BufferedInputStream.read(BufferedInputStream.java:345) at java.io.DataInputStream.readFully(DataInputStream.java:195) at java.io.DataInputStream.readLong(DataInputStream.java:416) at org.apache.zookeeper.server.quorum.QuorumCnxManager.handleConnection(QuorumCnxManager.java:524) at org.apache.zookeeper.server.quorum.QuorumCnxManager.receiveConnection(QuorumCnxManager.java:478) at org.apache.zookeeper.server.quorum.QuorumCnxManager$Listener.run(QuorumCnxManager.java:934) 2020-05-27 00:07:57,861 ERROR Failed to verify hostname: 10-244-180-244.lab-zookeeper-client.kafka-lab.svc.cluster.local (org.apache.zookeeper.common.ZKTrustManager) [lab-zookeeper-1.lab-zookeeper-nodes.kafka-lab.svc/10.244.69.60:3888] javax.net.ssl.SSLPeerUnverifiedException: Certificate for <10-244-180-244.lab-zookeeper-client.kafka-lab.svc.cluster.local> doesn’t match any of the subject alternative names: [lab-zookeeper-2.lab-zookeeper-nodes.kafka-lab.svc.cluster.local, lab-zookeeper-2.lab-zookeeper-nodes.kafka-lab.svc, lab-zookeeper-client.kafka-lab.svc, lab-zookeeper-client.kafka-lab.svc.cluster.local, lab-zookeeper-client, lab-zookeeper-client.kafka-lab] at org.apache.zookeeper.common.ZKHostnameVerifier.matchDNSName(ZKHostnameVerifier.java:224) at org.apache.zookeeper.common.ZKHostnameVerifier.verify(ZKHostnameVerifier.java:170) at org.apache.zookeeper.common.ZKTrustManager.performHostVerification(ZKTrustManager.java:141) at org.apache.zookeeper.common.ZKTrustManager.checkClientTrusted(ZKTrustManager.java:74) at sun.security.ssl.ServerHandshaker.clientCertificate(ServerHandshaker.java:2037) at sun.security.ssl.ServerHandshaker.processMessage(ServerHandshaker.java:233) at sun.security.ssl.Handshaker.processLoop(Handshaker.java:1082) at sun.security.ssl.Handshaker.process_record(Handshaker.java:1010) at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:1079) at sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1388) at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1416) at sun.security.ssl.SSLSocketImpl.getSession(SSLSocketImpl.java:2309) at org.apache.zookeeper.server.quorum.UnifiedServerSocket$UnifiedSocket.detectMode(UnifiedServerSocket.java:273) at org.apache.zookeeper.server.quorum.UnifiedServerSocket$UnifiedSocket.getSocket(UnifiedServerSocket.java:301) at org.apache.zookeeper.server.quorum.UnifiedServerSocket$UnifiedSocket.access$400(UnifiedServerSocket.java:180) at org.apache.zookeeper.server.quorum.UnifiedServerSocket$UnifiedInputStream.getRealInputStream(UnifiedServerSocket.java:700) at org.apache.zookeeper.server.quorum.UnifiedServerSocket$UnifiedInputStream.read(UnifiedServerSocket.java:694) at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) at java.io.BufferedInputStream.read1(BufferedInputStream.java:286) at java.io.BufferedInputStream.read(BufferedInputStream.java:345) at java.io.DataInputStream.readFully(DataInputStream.java:195) at java.io.DataInputStream.readLong(DataInputStream.java:416) at org.apache.zookeeper.server.quorum.QuorumCnxManager.handleConnection(QuorumCnxManager.java:524) at org.apache.zookeeper.server.quorum.QuorumCnxManager.receiveConnection(QuorumCnxManager.java:478) at org.apache.zookeeper.server.quorum.QuorumCnxManager$Listener.run(QuorumCnxManager.java:934) 2020-05-27 00:07:57,861 INFO Accepted TLS connection from 10-244-180-244.lab-zookeeper-client.kafka-lab.svc.cluster.local/10.244.180.244:54086 - NONE - SSL_NULL_WITH_NULL_NULL (org.apache.zookeeper.server.quorum.UnifiedServerSocket) [lab-zookeeper-1.lab-zookeeper-nodes.kafka-lab.svc/10.244.69.60:3888] 2020-05-27 00:07:57,861 WARN Exception reading or writing challenge: {} (org.apache.zookeeper.server.quorum.QuorumCnxManager) [lab-zookeeper-1.lab-zookeeper-nodes.kafka-lab.svc/10.244.69.60:3888] javax.net.ssl.SSLException: Connection has been shutdown: javax.net.ssl.SSLHandshakeException: java.security.cert.CertificateException: Failed to verify both host address and host name at sun.security.ssl.SSLSocketImpl.checkEOF(SSLSocketImpl.java:1554) at sun.security.ssl.AppInputStream.read(AppInputStream.java:95) at org.apache.zookeeper.server.quorum.UnifiedServerSocket$UnifiedInputStream.read(UnifiedServerSocket.java:694) at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) at java.io.BufferedInputStream.read1(BufferedInputStream.java:286) at java.io.BufferedInputStream.read(BufferedInputStream.java:345) at java.io.DataInputStream.readFully(DataInputStream.java:195) at java.io.DataInputStream.readLong(DataInputStream.java:416) at org.apache.zookeeper.server.quorum.QuorumCnxManager.handleConnection(QuorumCnxManager.java:524) at org.apache.zookeeper.server.quorum.QuorumCnxManager.receiveConnection(QuorumCnxManager.java:478) at org.apache.zookeeper.server.quorum.QuorumCnxManager$Listener.run(QuorumCnxManager.java:934) Caused by: javax.net.ssl.SSLHandshakeException: java.security.cert.CertificateException: Failed to verify both host address and host name at sun.security.ssl.Alerts.getSSLException(Alerts.java:198) at sun.security.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1967) at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:331) at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:325) at sun.security.ssl.ServerHandshaker.clientCertificate(ServerHandshaker.java:2055) at sun.security.ssl.ServerHandshaker.processMessage(ServerHandshaker.java:233) at sun.security.ssl.Handshaker.processLoop(Handshaker.java:1082) at sun.security.ssl.Handshaker.process_record(Handshaker.java:1010) at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:1079) at sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1388) at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1416) at sun.security.ssl.SSLSocketImpl.getSession(SSLSocketImpl.java:2309) at org.apache.zookeeper.server.quorum.UnifiedServerSocket$UnifiedSocket.detectMode(UnifiedServerSocket.java:273) at org.apache.zookeeper.server.quorum.UnifiedServerSocket$UnifiedSocket.getSocket(UnifiedServerSocket.java:301) at org.apache.zookeeper.server.quorum.UnifiedServerSocket$UnifiedSocket.access$400(UnifiedServerSocket.java:180) at org.apache.zookeeper.server.quorum.UnifiedServerSocket$UnifiedInputStream.getRealInputStream(UnifiedServerSocket.java:700) … 9 more Caused by: java.security.cert.CertificateException: Failed to verify both host address and host name at org.apache.zookeeper.common.ZKTrustManager.performHostVerification(ZKTrustManager.java:145) at org.apache.zookeeper.common.ZKTrustManager.checkClientTrusted(ZKTrustManager.java:74) at sun.security.ssl.ServerHandshaker.clientCertificate(ServerHandshaker.java:2037) … 20 more Caused by: javax.net.ssl.SSLPeerUnverifiedException: Certificate for <10-244-180-244.lab-zookeeper-client.kafka-lab.svc.cluster.local> doesn’t match any of the subject alternative names: [lab-zookeeper-2.lab-zookeeper-nodes.kafka-lab.svc.cluster.local, lab-zookeeper-2.lab-zookeeper-nodes.kafka-lab.svc, lab-zookeeper-client.kafka-lab.svc, lab-zookeeper-client.kafka-lab.svc.cluster.local, lab-zookeeper-client, lab-zookeeper-client.kafka-lab] at org.apache.zookeeper.common.ZKHostnameVerifier.matchDNSName(ZKHostnameVerifier.java:224) at org.apache.zookeeper.common.ZKHostnameVerifier.verify(ZKHostnameVerifier.java:170) at org.apache.zookeeper.common.ZKTrustManager.performHostVerification(ZKTrustManager.java:141) … 22 more 2020-05-27 00:07:58,366 INFO Authenticated Id ‘CN=lab-kafka,O=io.strimzi’ for Scheme ‘x509’ (org.apache.zookeeper.server.auth.X509AuthenticationProvider) [nioEventLoopGroup-7-2] 2020-05-27 00:07:58,367 WARN Closing connection to /10.244.65.25:41740 (org.apache.zookeeper.server.NettyServerCnxn) [nioEventLoopGroup-7-2] java.io.IOException: ZK down at org.apache.zookeeper.server.NettyServerCnxn.receiveMessage(NettyServerCnxn.java:474) at org.apache.zookeeper.server.NettyServerCnxn.processMessage(NettyServerCnxn.java:360) at org.apache.zookeeper.server.NettyServerCnxnFactory$CnxnChannelHandler.channelRead(NettyServerCnxnFactory.java:266) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:377) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:355) at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1470) at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1219) at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1266) at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:498) at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:437) at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:276) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:377) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:355) at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:377) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363) at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:714) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:650) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:576) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493) at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.lang.Thread.run(Thread.java:748) 2020-05-27 00:07:58,853 INFO Authenticated Id ‘CN=cluster-operator,O=io.strimzi’ for Scheme ‘x509’ (org.apache.zookeeper.server.auth.X509AuthenticationProvider) [nioEventLoopGroup-7-3] 2020-05-27 00:07:58,853 WARN Closing connection to /10.244.69.241:53394 (org.apache.zookeeper.server.NettyServerCnxn) [nioEventLoopGroup-7-3] java.io.IOException: ZK down at org.apache.zookeeper.server.NettyServerCnxn.receiveMessage(NettyServerCnxn.java:474) at org.apache.zookeeper.server.NettyServerCnxn.processMessage(NettyServerCnxn.java:360) at org.apache.zookeeper.server.NettyServerCnxnFactory$CnxnChannelHandler.channelRead(NettyServerCnxnFactory.java:266) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:377) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:355) at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1470) at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1219) at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1266) at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:498) at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:437) at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:276) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:377) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:355) at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:377) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363) at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:714) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:650) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:576) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493) at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.lang.Thread.run(Thread.java:748)
What I got most confused is the log seems: 1 verifying against the host address 10.244.180.244 (the K8s internal ip) of a zookeeper peer pod, which failed because the cert doesn’t cover that ip 2. then trying to verify hostname: 10-244-180-244.lab-zookeeper-client.kafka-lab.svc.cluster.local, which is essentially combining the pod ip and client service.
I guess that’s related to the migration from tls sidecar for zookeepr to the built-in tls support. Would really appreciate any help.
And here is the zookeeper section of Kafka manifest:
zookeeper:
replicas: 3
resources:
requests:
memory: 6Gi
cpu: "2"
limits:
memory: 6Gi
cpu: "2"
jvmOptions:
-Xms: 3072m
-Xmx: 3072m
storage:
type: persistent-claim
size: 20Gi
class: ssd
deleteClaim: false
Issue Analytics
- State:
- Created 3 years ago
- Comments:32 (14 by maintainers)
Top GitHub Comments
@Escaflow I’m very confused now. Are you working with @oulydna and talking about the same issue? Or do you have your own spearate issue which you think is related? The issue here in the previous logs is
Received fatal alert: certificate_unknown
- so that is IMHO something what happens before the hostname verification.If you have your own cluster with the hostname verification issue, can you share the logs? You can use this script to collect them into. ZIP archive: https://github.com/strimzi/strimzi-kafka-operator/blob/master/tools/report.sh
Really appreciate all the help @scholzj . Cheers!