Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[BUG] - Unable to attach or mount volumes on GCP ocassionally

See original GitHub issue

OS system and architecture in which you are running QHub

linux

Expected behavior

QHub should by default be able to spawn a user server based on qhub configmaps and user settings.

Actual behavior

The user spawning process is actually timing out because of the following error

2022-03-09T22:29:19Z [Warning] Unable to attach or mount volumes: unmounted volumes=[conda-store], unattached volumes=[server-idle-culling home conda-store dask-etc]: timed out waiting for the condition

We were able to find a workaround that consists of launching a dashboard then re-launching the user pod. As it seems that only the first deployment fails with the mount issue. Another way to fix seems to condone the current nodes (for the user and general) and wait for the next user spawn to scale a new node.

How to Reproduce the problem?

we still don`t have a good way to reproduce the behavior, the error seems to have uncertainty attached to it. Future situations where this error shows up might help to deduce the reproducibility circumstances.

Command output

No response

Versions and dependencies used.

pip install qhub from main, and docker images set to sha-3470504.

Compute environment

GCP

Integrations

No response

Anything else?

No response

Issue Analytics

State:
Created 2 years ago
Comments:9 (9 by maintainers)

Top GitHub Comments

1reaction

viniciusdccommented, Oct 4, 2022

This happened again this week. Not sure what caused but the conda-store mounts were affecting the dask-gateway pod spawning this time. I followed the fix suggested by @costrouc of resetting the node, which worked, but we will need to check this more deeply once the migration ends.

I downloaded some logs from the node, wich had some error messages showing on. I will look at then later this week to see if I find anything useful.

0reactions

viniciusdccommented, Oct 4, 2022

For now, this will need to be added to the FAQ of Nebari with instructions on how to reset the general node