OAuth 2.0 (keycloak) with self-signed certs causes kafka-kafka-{0,1,2} pods to crashloop backoff
See original GitHub issueWhen configuring the operator with a Kafka object, the kafka pods crash during start up with
$ kubectl logs --previous -n strimzi-kafka kafka-kafka-0 | grep Caused
Caused by: java.lang.RuntimeException: Failed to fetch public keys needed to validate JWT signatures: https://keycloak-[fqdn]/auth/realms/OIDC/protocol/openid-connect/certs
Caused by: javax.net.ssl.SSLHandshakeException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
Caused by: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
Caused by: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
While some information has been redacted, all hostnames that include [fqdn] are resolvable in DNS using that hostname. In places labeled [sensitive], an internal project name has been redacted, but all related names are valid for where they are used.
Expected behavior Kafka to be able to fetch the public keys required for JWT validation.
Environment (please complete the following information):
- Strimzi version: 0.28.0 (kafka 3.1.0)
- Installation method: Operator installed via helm
- Kubernetes cluster: v1.21.5-eks-9017834
- Infrastructure: Amazon EKS
- keycloak 15.0.2
- istio 1.10.1
A bit more background:
Our environment has a root self-signed cert managed by an external entity. They have provided our kubernetes cluster with an intermediate cert, which we’ve loaded into cert-manager. Using cert-manager, we’ve created certificates for Keycloak and Kafka. Keycloak and istio are already deployed into the cluster. Keycloak is protected by istio requiring mutual TLS to access the pod using the internal service name. We’re using terraform to create the strimzi-kafka namespace, copy the keycloak certificates from the istio-system namespace into the strimzi-kafka namespace and, deploy strimzi-kafka. (We recognize that this doesn’t handle certificate renewals and recognize that we will have to examine some mechanism for secret replication between namespaces.) While it shouldn’t impact this scenario, istio is configured with a gateway for all of the kafka hostnames (kafka, kafka-0, kafka-1, and kafka-2) with tls.mode = PASSTHROUGH, so kafka is terminating inbound connections. The keycloak URL provided to kafka is the virtualservice name provided to keycloak, meaning it’s accessible via the intranet in use for this cluster. We’ve confirmed within another pod within the cluster that the pods can access the external url for keycloak without issue.
YAML files and logs
Our kafka configuration:
apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
name: kafka
namespace: strimzi-kafka
spec:
entityOperator:
topicOperator: {}
userOperator: {}
kafka:
authorization:
clientId: oidc-client
delegateToKafkaAcls: true
superUsers:
- User:service-account-oidc-client
tokenEndpointUri: https://keycloak-[fqdn]/auth/realms/OIDC/protocol/openid-connect/token
type: keycloak
config:
log.message.format.version: "2.8"
offsets.topic.replication.factor: 1
transaction.state.log.min.isr: 1
transaction.state.log.replication.factor: 1
listeners:
- authentication:
jwksEndpointUri: https://keycloak-[fqdn]/auth/realms/OIDC/protocol/openid-connect/certs
maxSecondsWithoutReauthentication: 3600
tlsTrustedCertificates:
- certificate: ca.crt
secretName: keycloak-[sensitive]-cert
type: oauth
userNameClaim: preferred_username
validIssuerUri: https://keycloak-[fqdn]/auth/realms/OIDC
configuration:
brokerCertChainAndKey:
certificate: tls.crt
key: tls.key
secretName: kafka-external-cert
brokers:
- advertisedHost: kafka-0.[fqdn]
broker: 0
- advertisedHost: kafka-1.[fqdn]
broker: 1
- advertisedHost: kafka-2.[fqdn]
broker: 2
name: external
port: 9094
tls: true
type: internal
- authentication:
jwksEndpointUri: https://keycloak-[fqdn]/auth/realms/OIDC/protocol/openid-connect/certs
maxSecondsWithoutReauthentication: 3600
type: oauth
userNameClaim: preferred_username
validIssuerUri: https://keycloak-[fqdn]/auth/realms/OIDC
configuration:
brokerCertChainAndKey:
certificate: tls.crt
key: tls.key
secretName: kafka-internal-cert
name: internal
port: 9093
tls: true
type: internal
- authentication:
jwksEndpointUri: https://keycloak-[fqdn]/auth/realms/OIDC/protocol/openid-connect/certs
maxSecondsWithoutReauthentication: 3600
type: oauth
userNameClaim: preferred_username
validIssuerUri: https://keycloak-[fqdn]/auth/realms/OIDC
name: plain
port: 9092
tls: false
type: internal
logging:
loggers:
log4j.logger.io.strimzi: DEBUG
log4j.logger.kafka: DEBUG
log4j.logger.org.apache.kafka: DEBUG
type: inline
replicas: 3
storage:
class: gp2-encrypted
deleteClaim: true
size: 15Gi
type: persistent-claim
zookeeper:
replicas: 3
storage:
class: gp2-encrypted
deleteClaim: true
size: 100Gi
type: persistent-claim
Inside another pod running in the cluster, which happens to have curl installed, using the value of the “ca.crt” secret that is stored in the keycloak-[sensitive]-cert tls secret, it is possible to obtain a jwt successfully using curl, like this:
curl -s -X POST https://keycloak-[fqdn]/auth/realms/OIDC/protocol/openid-connect/token --cacert /tmp/tmp.zyYoYth7Ks -H 'Content-Type: application/x-www-form-urlencoded' -d client_secret=9478[sensitive]2a22 -d grant_type=client_credentials -d client_id=oidc-client
{"access_token":"eyJhbGciOiJSUzI1...R7h4Yw","expires_in":7200,"refresh_expires_in":0,"token_type":"Bearer","not-before-policy":0,"scope":"profile email"}
Kafka start up fails, logs attached.
Issue Analytics
- State:
- Created a year ago
- Comments:8 (3 by maintainers)
Top GitHub Comments
We couldn’t see the forest for the trees. This will be an integration point for applications outside of the cluster, and the internal paths won’t be used at all. We’ve added the ca cert to all of the other listeners and it did in fact start kafka properly. Thank you @scholzj !
It did turn out to be that. It took a while to confirm, but we now have everything working in our “sandbox” environment with SSL connections between all components and everything working properly. Thanks!