GatewayCluster with kubernetes backend fails to start after update to daskhub v4.5.4
See original GitHub issueWhat happened:
I received the following error: GatewayClusterError: Cluster 'adrastea.b4286778ea9b49f4b4264f982f5b278d' failed to start, see logs for more information
. The logs suggest that the dask-scheduler
command is missing an argument after --host
, which looks intentional based on this code. Here are the logs:
This occurred upon update from v4.5.3 of the daskhub chart to v4.5.4. Note that several other issues occurred related to jupyterhub. I eventually worked my way through those and ultimately just deleted and recreated our GKE cluster. That fixed these other issues (primarily related to authentication) but this one remains.
What you expected to happen:
A working GatewayCluster object to be returned from the gateway.new_cluster()
call
Minimal Complete Verifiable Example: I’d imagine a lot of the reproducibility depends on our specific GKE infrastructure and chart config, but the actual code that raises this bug is just
import dask_gateway
gateway = dask_gateway.GatewayCluster()
cluster = gateway.new_cluster()
Anything else we need to know?:
Environment:
- GKE cluster
- daskhub chart version: 4.5.4
- image: custom image running the following
- Dask version: 2.30.0
- Python version: 3.8.6
- Operating System: ubuntu
- Install method (conda, pip, source): conda
Issue Analytics
- State:
- Created 3 years ago
- Comments:9 (2 by maintainers)
Top GitHub Comments
In the end, I just decided to go with the default scheduler and use our customized image just on the worker. Seems to work if the important packages are pinned to the same version across these images. I’m sure it wouldn’t take too much digging to figure out what was going on. Let me know if you think that would be helpful and I’m happy to give it a bit more effort. Otherwise, I think we can close this
Thanks for following up @bolliger32!