Che operator fails to reconcile che after failed installation of devworkspace
See original GitHub issueDescribe the bug
I’ve had some issues during the installation of the che-operator with devworkspaces enabled and wanted to reinstall. I deleted eclipse-che
and devworkspace-controller
namespaces (by running kubectl delete namespace ...
and deleting the finalizers on resources where necessary).
I then wanted to install che using chectl. It never succeeded with these errors in the che operator log:
time="2021-03-09T20:41:31Z" level=info msg="Running exec for 'create Keycloak DB, user, privileges' in the pod 'postgres-7d794f7b58-h2xkm'"
time="2021-03-09T20:41:31Z" level=error msg="Error running exec: Internal error occurred: failed calling webhook \"validate-exec.devworkspace-controller.svc\": Post \"https://devworkspace-webhookserver.devworkspace-controller.svc:443/validate?timeout=30s\": service \"devworkspace-webhookserver\" not found, command: [/bin/bash -c OUT=$(psql postgres -tAc \"SELECT 1 FROM pg_roles WHERE rolname='keycloak'\"); if [ $OUT -eq 1 ]; then echo \"DB exists\"; exit 0; fi && psql -c \"CREATE USER keycloak WITH PASSWORD 'is8NVHQ7r0GL'\" && psql -c \"CREATE DATABASE keycloak\" && psql -c \"GRANT ALL PRIVILEGES ON DATABASE keycloak TO keycloak\" && psql -c \"ALTER USER ${POSTGRESQL_USER} WITH SUPERUSER\"]"
time="2021-03-09T20:41:31Z" level=error msg="Stderr: "
Notice the reference to the devworkspace-webhookserver
during the installation of the postgres DB for che server.
Because setting spec.devworkspace.enable: false
actually does not uninstall devworkspace from the cluster in any way, the user has no way of making the installation work again.
Note that I was able to make the installation work again by running make uninstall
from devworkspace-operator
sources, but that is something that we might not want the users to do,
Che version
- latest
Steps to reproduce
- Install Che using
chectl server:deploy -p openshift -n eclipse-che -a operator
kubectl edit checluster eclipse-che -n eclipse-che
and setspec.devworkspace.enable: true
- Let the installation finish
kubectl delete namespace eclipse-che devworkspace-controller
- Remove finalizers on resources blocking the deletion
- Wait for the namespaces to be deleted
- Try to install Che using
chectl server:deploy -p openshift -n eclipse-che -a operator
again
The installation never finishes with the error in the che-operator log as described above.
Expected behavior
We should have a documented way of cleaning up the cluster to be able to do repeated installations.
Runtime
- Openshift 4.6
Screenshots
N/A
Installation method
- see repro steps
Environment
- RHPDS with OpenShift 4.7
Issue Analytics
- State:
- Created 3 years ago
- Comments:8 (8 by maintainers)
Top GitHub Comments
you went into very non-optimal way, in addition to deleting the finalizers, you just need to clean up all the Cluster-scoped resources, where webhooks are most critical.
pods labels are not propagated to pod/execs subresources, here is an issue on K8s side which should unblock us https://github.com/kubernetes/kubernetes/issues/91732 Maybe it’s changed but issue is still opened, but I doubt
We can use now
chectl server:delete
to remove Eclipse Che + clean up DevWorkspace resources.