question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Che operator fails to reconcile che after failed installation of devworkspace

See original GitHub issue

Describe the bug

I’ve had some issues during the installation of the che-operator with devworkspaces enabled and wanted to reinstall. I deleted eclipse-che and devworkspace-controller namespaces (by running kubectl delete namespace ... and deleting the finalizers on resources where necessary).

I then wanted to install che using chectl. It never succeeded with these errors in the che operator log:

time="2021-03-09T20:41:31Z" level=info msg="Running exec for 'create Keycloak DB, user, privileges' in the pod 'postgres-7d794f7b58-h2xkm'"
time="2021-03-09T20:41:31Z" level=error msg="Error running exec: Internal error occurred: failed calling webhook \"validate-exec.devworkspace-controller.svc\": Post \"https://devworkspace-webhookserver.devworkspace-controller.svc:443/validate?timeout=30s\": service \"devworkspace-webhookserver\" not found, command: [/bin/bash -c OUT=$(psql postgres -tAc \"SELECT 1 FROM pg_roles WHERE rolname='keycloak'\"); if [ $OUT -eq 1 ]; then echo \"DB exists\"; exit 0; fi && psql -c \"CREATE USER keycloak WITH PASSWORD 'is8NVHQ7r0GL'\" && psql -c \"CREATE DATABASE keycloak\" && psql -c \"GRANT ALL PRIVILEGES ON DATABASE keycloak TO keycloak\" && psql -c \"ALTER USER ${POSTGRESQL_USER} WITH SUPERUSER\"]"
time="2021-03-09T20:41:31Z" level=error msg="Stderr: "

Notice the reference to the devworkspace-webhookserver during the installation of the postgres DB for che server.

Because setting spec.devworkspace.enable: false actually does not uninstall devworkspace from the cluster in any way, the user has no way of making the installation work again.

Note that I was able to make the installation work again by running make uninstall from devworkspace-operator sources, but that is something that we might not want the users to do,

Che version

  • latest

Steps to reproduce

  1. Install Che using chectl server:deploy -p openshift -n eclipse-che -a operator
  2. kubectl edit checluster eclipse-che -n eclipse-che and set spec.devworkspace.enable: true
  3. Let the installation finish
  4. kubectl delete namespace eclipse-che devworkspace-controller
  5. Remove finalizers on resources blocking the deletion
  6. Wait for the namespaces to be deleted
  7. Try to install Che using chectl server:deploy -p openshift -n eclipse-che -a operator again

The installation never finishes with the error in the che-operator log as described above.

Expected behavior

We should have a documented way of cleaning up the cluster to be able to do repeated installations.

Runtime

  • Openshift 4.6

Screenshots

N/A

Installation method

  • see repro steps

Environment

  • RHPDS with OpenShift 4.7

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:8 (8 by maintainers)

github_iconTop GitHub Comments

1reaction
sleshchenkocommented, Mar 10, 2021

(by running kubectl delete namespace … and deleting the finalizers on resources where necessary).

you went into very non-optimal way, in addition to deleting the finalizers, you just need to clean up all the Cluster-scoped resources, where webhooks are most critical.

Could we just add a labelselector to our validating webhook to at least kind-of avoid this problem?

pods labels are not propagated to pod/execs subresources, here is an issue on K8s side which should unblock us https://github.com/kubernetes/kubernetes/issues/91732 Maybe it’s changed but issue is still opened, but I doubt

0reactions
tolushacommented, May 13, 2022

We can use now chectl server:delete to remove Eclipse Che + clean up DevWorkspace resources.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Administration guide Red Hat OpenShift Dev Spaces 3.3
Dev Workspace operator : Creates and controls the necessary OpenShift objects to run ... This prevents existing workspaces from failing to run due...
Read more >
Monitoring the Dev Workspace Operator - Eclipse
Startup failure due to the following errors: CreateContainerError , RunContainerError , FailedScheduling , FailedMount . Unknown. Unknown failure reason.
Read more >
devworkspace - Go Packages
When a project reaches major version v1 it is considered stable. Learn more. Repository. github.com/eclipse-che/che-operator. Links.
Read more >
Search OpenShift CI
Error : running task Updating Prometheus Operator failed: reconciling ... blank after selecting any operator to install from OperatorHub RELEASE_PENDING.
Read more >
OpenShift Container Platform 4.6 release notes | OKD 4.6
You are viewing documentation for a release that is no longer maintained. ... enabled in the OpenShift Console when the Che Workspace Operator...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found