question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Webhook certificates validation fails

See original GitHub issue

/kind bug

What steps did you take and what happened: I installed the latest version of Katib by cloning the repo’s master tree and running make deploy against aour OpenShift 4.6.21 cluster. Then I applied random-example.yaml. Created experiment remains in Running condition, Trial’s pods are not updated with sidecar containers, `deployment/katib-controller’ shows logs with following lines:

2021/04/07 14:57:53 http: TLS handshake error from 10.254.2.1:47974: remote error: tls: bad certificate
2021/04/07 14:57:53 http: TLS handshake error from 10.254.2.1:47972: remote error: tls: bad certificate

What did you expect to happen: Webhook certificates are valid, Trial’s pods are injected with metric-gathering sidecars, Experiment successfully gathers metrics and progresses as it should.

Anything else you would like to add: As a result of job/katib-cert-generator WebhookConfiguration’s .webhooks[].clientConfig.caBundle are updated with ca.crt from katib-cert-generator-token secret, assigned for the SA katib-cert-generator. According to documentation on CSR, ServiceAccount’s ca.crt are not guaranteed to verify arbitrary client certificates:

None of these usages are related to ServiceAccount token secrets .data[ca.crt] in any way. That CA bundle is only guaranteed to verify a connection to the API server using the default service (kubernetes.default.svc).

I fetched tls.crt from secret/katib-webhook-cert and ca.crt from secret/katib-cert-generator-token-***, attached to the corresponding SA. Indeed, the pair is not valid:

[maanur@maanur-notebook katib-webhook-cert]$ openssl verify -verbose -CAfile ca.crt katib.crt
O = system:nodes, CN = system:node:katib-controller.kubeflow.svc
error 20 at 0 depth lookup: unable to get local issuer certificate
error katib.crt: verification failed

Environment:

  • Katib version: 86884ca2c2ddbd317682ff771eb79c8bec014df5
  • Kubeflow version: (not used)
  • Kubernetes version: 1.19
  • OpenShift version: 4.6.21

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:6 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
andreyvelichcommented, Apr 8, 2021

Thank you for creating this @maanur and tested Katib on OpenShift!

Please can you try to specify kubernetes.io/legacy-unknown signerName here: https://github.com/kubeflow/katib/blob/master/hack/cert-generator.sh#L82. Then, build and push your custom image for the cert generator:

docker build -t docker.io/<registry>/cert-generator -f cmd/cert-generator/v1beta1/Dockerfile .
docker push docker.io/<registry>/cert-generator

And use your custom image in the manifest: https://github.com/kubeflow/katib/blob/master/manifests/v1beta1/installs/katib-standalone/kustomization.yaml#L46.

My concern is that for OpenShift we need a different signerName. /cc @tenzen-y

0reactions
stale[bot]commented, Aug 21, 2021

This issue has been automatically closed because it has not had recent activity. Please comment “/reopen” to reopen it.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Solved: Webhook validation failing after changing SSL cert...
I've developed a service that creates cards on a trello board and then it creates a webhook for tracking that card movements.
Read more >
The CartridgeRequirements webhook fails with x509 ... - IBM
An upgrade might fail with the following CartridgeRequirements webhook certificate error: error: cartridgerequirements.base.automation.ibm.com ...
Read more >
Webhook SSL verification failed after adding custom CA ...
I was able to get this working as expected by adding the contents of the full custom certificate chain to the bottom of...
Read more >
Getting error message when trying to use a self-signed SSL ...
I'm trying to create a new Webhook (Service hook) for our project. ... The remote certificate is invalid according to the validation procedure....
Read more >
Ensuring compatibility of webhook certificates before ...
GKE webhook and aggregated API server backends reliant on the CN field for server identity validation will fail due to authentication failure.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found