Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Che operator fails to reconcile che after failed installation of devworkspace

See original GitHub issue

Describe the bug

I’ve had some issues during the installation of the che-operator with devworkspaces enabled and wanted to reinstall. I deleted eclipse-che and devworkspace-controller namespaces (by running kubectl delete namespace ... and deleting the finalizers on resources where necessary).

I then wanted to install che using chectl. It never succeeded with these errors in the che operator log:

time="2021-03-09T20:41:31Z" level=info msg="Running exec for 'create Keycloak DB, user, privileges' in the pod 'postgres-7d794f7b58-h2xkm'"
time="2021-03-09T20:41:31Z" level=error msg="Error running exec: Internal error occurred: failed calling webhook \"validate-exec.devworkspace-controller.svc\": Post \"https://devworkspace-webhookserver.devworkspace-controller.svc:443/validate?timeout=30s\": service \"devworkspace-webhookserver\" not found, command: [/bin/bash -c OUT=$(psql postgres -tAc \"SELECT 1 FROM pg_roles WHERE rolname='keycloak'\"); if [ $OUT -eq 1 ]; then echo \"DB exists\"; exit 0; fi && psql -c \"CREATE USER keycloak WITH PASSWORD 'is8NVHQ7r0GL'\" && psql -c \"CREATE DATABASE keycloak\" && psql -c \"GRANT ALL PRIVILEGES ON DATABASE keycloak TO keycloak\" && psql -c \"ALTER USER ${POSTGRESQL_USER} WITH SUPERUSER\"]"
time="2021-03-09T20:41:31Z" level=error msg="Stderr: "

Notice the reference to the devworkspace-webhookserver during the installation of the postgres DB for che server.

Because setting spec.devworkspace.enable: false actually does not uninstall devworkspace from the cluster in any way, the user has no way of making the installation work again.

Note that I was able to make the installation work again by running make uninstall from devworkspace-operator sources, but that is something that we might not want the users to do,

Che version

latest

Steps to reproduce

Install Che using chectl server:deploy -p openshift -n eclipse-che -a operator
kubectl edit checluster eclipse-che -n eclipse-che and set spec.devworkspace.enable: true
Let the installation finish
kubectl delete namespace eclipse-che devworkspace-controller
Remove finalizers on resources blocking the deletion
Wait for the namespaces to be deleted
Try to install Che using chectl server:deploy -p openshift -n eclipse-che -a operator again

The installation never finishes with the error in the che-operator log as described above.

Expected behavior

We should have a documented way of cleaning up the cluster to be able to do repeated installations.

Runtime

Openshift 4.6

Screenshots

N/A

Installation method

see repro steps

Environment

RHPDS with OpenShift 4.7

Issue Analytics

State:
Created 3 years ago
Comments:8 (8 by maintainers)

Top GitHub Comments

1reaction

sleshchenkocommented, Mar 10, 2021

(by running kubectl delete namespace … and deleting the finalizers on resources where necessary).

you went into very non-optimal way, in addition to deleting the finalizers, you just need to clean up all the Cluster-scoped resources, where webhooks are most critical.

Could we just add a labelselector to our validating webhook to at least kind-of avoid this problem?

pods labels are not propagated to pod/execs subresources, here is an issue on K8s side which should unblock us https://github.com/kubernetes/kubernetes/issues/91732 Maybe it’s changed but issue is still opened, but I doubt

0reactions

tolushacommented, May 13, 2022

We can use now chectl server:delete to remove Eclipse Che + clean up DevWorkspace resources.

Top Results From Across the Web

Administration guide Red Hat OpenShift Dev Spaces 3.3

Dev Workspace operator : Creates and controls the necessary OpenShift objects to run ... This prevents existing workspaces from failing to run due...

Monitoring the Dev Workspace Operator - Eclipse

Startup failure due to the following errors: CreateContainerError , RunContainerError , FailedScheduling , FailedMount . Unknown. Unknown failure reason.

devworkspace - Go Packages

When a project reaches major version v1 it is considered stable. Learn more. Repository. github.com/eclipse-che/che-operator. Links.

Search OpenShift CI

Error : running task Updating Prometheus Operator failed: reconciling ... blank after selecting any operator to install from OperatorHub RELEASE_PENDING.

OpenShift Container Platform 4.6 release notes | OKD 4.6

You are viewing documentation for a release that is no longer maintained. ... enabled in the OpenShift Console when the Che Workspace Operator...