question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[BUG] - Unable to attach or mount volumes on GCP ocassionally

See original GitHub issue

OS system and architecture in which you are running QHub

linux

Expected behavior

QHub should by default be able to spawn a user server based on qhub configmaps and user settings.

Actual behavior

The user spawning process is actually timing out because of the following error

2022-03-09T22:29:19Z [Warning] Unable to attach or mount volumes: unmounted volumes=[conda-store], unattached volumes=[server-idle-culling home conda-store dask-etc]: timed out waiting for the condition

We were able to find a workaround that consists of launching a dashboard then re-launching the user pod. As it seems that only the first deployment fails with the mount issue. Another way to fix seems to condone the current nodes (for the user and general) and wait for the next user spawn to scale a new node.

How to Reproduce the problem?

we still don`t have a good way to reproduce the behavior, the error seems to have uncertainty attached to it. Future situations where this error shows up might help to deduce the reproducibility circumstances.

Command output

No response

Versions and dependencies used.

pip install qhub from main, and docker images set to sha-3470504.

Compute environment

GCP

Integrations

No response

Anything else?

No response

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:9 (9 by maintainers)

github_iconTop GitHub Comments

1reaction
viniciusdccommented, Oct 4, 2022

This happened again this week. Not sure what caused but the conda-store mounts were affecting the dask-gateway pod spawning this time. I followed the fix suggested by @costrouc of resetting the node, which worked, but we will need to check this more deeply once the migration ends.

I downloaded some logs from the node, wich had some error messages showing on. I will look at then later this week to see if I find anything useful.

0reactions
viniciusdccommented, Oct 4, 2022

For now, this will need to be added to the FAQ of Nebari with instructions on how to reset the general node

Read more comments on GitHub >

github_iconTop Results From Across the Web

Unable to attach or mount volumes on GCP ocassionally #1156
We were able to find a workaround that consists of launching a dashboard then re-launching the user pod. As it seems that only...
Read more >
Troubleshooting Kubernetes FailedAttachVolume and ...
There may be situations when Kubernetes can detach the volume but is unable to attach or mount the disk in the scheduled node....
Read more >
Unable to attach or mount volumes on pods - Stack Overflow
The problem is it appears to be completely random. Sometimes the pods come up fine and others get the mount error. Google Cloud...
Read more >
Troubleshooting | Google Kubernetes Engine (GKE)
GKE returns an error if there are issues with a workload's Pods. You can check the status of a Pod using the kubectl...
Read more >
Kubernetes - Kubelet Unable to attach or mount volumes
The fix is to remove the stale VolumeAttachment. ... After this your pod should eventually pick up and retry, or you could remove...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found