TLS support for Zookeeper client in Topic Operator
See original GitHub issueIs your feature request related to a problem? Please describe. We use Strimzi Topic Operator (TO) with an external AWS MSK cluster configured to allow only TLS connections to the brokers. Additionally MSK provides both Plaintext and TLS endpoints for Zookeeper. We managed to make TO work with TLS brokers in #2761. And it looks like the issue should be resolved after #5201. Now it’s time to switch to TLS Zookeeper endpoints. But simple switch to proper hosts/ports doesn’t work. Looking through the code I don’t even see anything related to TLS in both configuration and client initialization.
Describe the solution you’d like
I’d like to see an analogue of STRIMZI_TLS_ENABLED
and STRIMZI_PUBLIC_CA
for Zookeeper in TO. In case of enabled TLS Zookeeper client should be properly configured. See:
Describe alternatives you’ve considered The following change to deployment works for me:
- name: STRIMZI_JAVA_SYSTEM_PROPERTIES
value: "-Dzookeeper.clientCnxnSocket=org.apache.zookeeper.ClientCnxnSocketNetty -Dzookeeper.client.secure=true"
Now I see this one in the log:
2021-08-20 07:27:24 INFO ZkClient:713 - zookeeper state changed (SyncConnected)
Additional context Without any tweaks I get lots of these:
2021-08-18 11:40:48 DEBUG SaslServerPrincipal:80 - Canonicalized address to ip-***.us-west-2.compute.internal
2021-08-18 11:40:48 INFO ClientCnxn:1112 - Opening socket connection to server z-1.***.c6.kafka.us-west-2.amazonaws.com/***:2182. Will not attempt to authenticate using SASL (unknown error)
2021-08-18 11:40:48 INFO ClientCnxn:959 - Socket connection established, initiating session, client: /***:50486, server: z-1.***.c6.kafka.us-west-2.amazonaws.com/***:2182
2021-08-18 11:40:48 DEBUG ClientCnxn:1027 - Session establishment request sent on z-1.***.c6.kafka.us-west-2.amazonaws.com/***:2182
2021-08-18 11:40:48 INFO ClientCnxn:1240 - Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect
And finally:
2021-08-18 11:41:06 ERROR Main:55 - Error deploying Session
org.I0Itec.zkclient.exception.ZkTimeoutException: Unable to connect to zookeeper server 'z-1.***.c6.kafka.us-west-2.amazonaws.com:2182,z-2.***.c6.kafka.us-west-2.amazonaws.com:2182,z-3.***.c6.kafka.us-west-2.amazonaws.com:2182' with timeout of 18000 ms
at org.I0Itec.zkclient.ZkClient.connect(ZkClient.java:1233) ~[com.101tec.zkclient-0.11.jar:?]
at org.I0Itec.zkclient.ZkClient.<init>(ZkClient.java:157) ~[com.101tec.zkclient-0.11.jar:?]
at org.I0Itec.zkclient.ZkClient.<init>(ZkClient.java:131) ~[com.101tec.zkclient-0.11.jar:?]
at org.I0Itec.zkclient.ZkClient.<init>(ZkClient.java:98) ~[com.101tec.zkclient-0.11.jar:?]
at io.strimzi.operator.topic.zk.Zk.createSync(Zk.java:36) ~[io.strimzi.topic-operator-0.24.0.jar:0.24.0]
at io.strimzi.operator.topic.zk.Zk.lambda$create$0(Zk.java:27) ~[io.strimzi.topic-operator-0.24.0.jar:0.24.0]
at io.vertx.core.impl.ContextImpl.lambda$null$0(ContextImpl.java:160) ~[io.vertx.vertx-core-4.1.0.jar:4.1.0]
at io.vertx.core.impl.AbstractContext.dispatch(AbstractContext.java:96) ~[io.vertx.vertx-core-4.1.0.jar:4.1.0]
at io.vertx.core.impl.ContextImpl.lambda$executeBlocking$1(ContextImpl.java:158) ~[io.vertx.vertx-core-4.1.0.jar:4.1.0]
at io.vertx.core.impl.TaskQueue.run(TaskQueue.java:76) ~[io.vertx.vertx-core-4.1.0.jar:4.1.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [io.netty.netty-common-4.1.65.Final.jar:4.1.65.Final]
at java.lang.Thread.run(Thread.java:829) [?:?]
Issue Analytics
- State:
- Created 2 years ago
- Reactions:1
- Comments:8 (4 by maintainers)
Top GitHub Comments
Assuming TO works with the following JAVA_OPTS it’s definitely about client not being configured for TLS:
We’ll migrate to Kafka 3+ once it’s released by AWS MSK team. So everything ^^^ is a minor temporary thing.
I don’t think there is any Strimzi documentation how to do it. The easiest way to do it is probably to deploy a Strimzi Kafka cluster and check / copy the configs from there. There are of course the Stunnel docs as well: https://www.stunnel.org/