[BUG] - Unable to attach or mount volumes on GCP ocassionally
See original GitHub issueOS system and architecture in which you are running QHub
linux
Expected behavior
QHub should by default be able to spawn a user server based on qhub configmaps and user settings.
Actual behavior
The user spawning process is actually timing out because of the following error
2022-03-09T22:29:19Z [Warning] Unable to attach or mount volumes: unmounted volumes=[conda-store], unattached volumes=[server-idle-culling home conda-store dask-etc]: timed out waiting for the condition
We were able to find a workaround that consists of launching a dashboard then re-launching the user pod. As it seems that only the first deployment fails with the mount issue. Another way to fix seems to condone the current nodes (for the user and general) and wait for the next user spawn to scale a new node.
How to Reproduce the problem?
we still don`t have a good way to reproduce the behavior, the error seems to have uncertainty attached to it. Future situations where this error shows up might help to deduce the reproducibility circumstances.
Command output
No response
Versions and dependencies used.
pip install qhub from main, and docker images set to sha-3470504
.
Compute environment
GCP
Integrations
No response
Anything else?
No response
Issue Analytics
- State:
- Created 2 years ago
- Comments:9 (9 by maintainers)
This happened again this week. Not sure what caused but the conda-store mounts were affecting the dask-gateway pod spawning this time. I followed the fix suggested by @costrouc of resetting the node, which worked, but we will need to check this more deeply once the migration ends.
I downloaded some logs from the node, wich had some error messages showing on. I will look at then later this week to see if I find anything useful.
For now, this will need to be added to the FAQ of Nebari with instructions on how to reset the general node