question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Conda-store pods get evicted on AWS

See original GitHub issue

Describe the bug

Seems like conda-store pods get evicted on AWS on a fresh deployment. This is tested with latest main commit: https://github.com/Quansight/qhub/commit/e7992115abfe65fd429999d5a4241e4863b2a85d

Describe on the pod:

│ Events:                                                                                                                                                                                                                                    │
│   Type     Reason            Age                From                Message                                                                                                                                                                │
│   ----     ------            ----               ----                -------                                                                                                                                                                │
│   Warning  FailedScheduling  51s (x2 over 52s)  default-scheduler   0/3 nodes are available: 1 node(s) had taint {node.kubernetes.io/disk-pressure: }, that the pod didn't tolerate, 2 node(s) didn't match Pod's node affinity.           │
│   Normal   TriggeredScaleUp  41s                cluster-autoscaler  pod triggered scale-up: [{eks-20bd5579-b270-ddc9-c256-f021f1d7978b 1->2 (max: 5)}]                                                                                     │
│   Warning  FailedScheduling  6s (x2 over 6s)    default-scheduler   0/4 nodes are available: 1 node(s) had taint {node.kubernetes.io/disk-pressure: }, that the pod didn't tolerate, 1 node(s) had taint {node.kubernetes.io/not-ready: }, │
│  that the pod didn't tolerate, 2 node(s) didn't match Pod's node affinity.                                                                                                                                                                 │
│

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:8 (8 by maintainers)

github_iconTop GitHub Comments

1reaction
aktechcommented, Jul 28, 2021

I think you would need a new deployment, if the volume node is already spun up in a conflicting zone then its very less likely that it will be moved after updating it to the latest version.

0reactions
iameskildcommented, Jul 28, 2021

That makes sense. Using the AWS console to confirm, the Availability Zones for the general node that these pods were running on was in us-east-2a whereas the 50 GB volume mounts are in `us-east-2b.

To get back to a working state, I drained the general node:

kubectl drain ip-10-10-4-189.us-east-2.compute.internal --ignore-daemonsets --delete-emptydir-data --force

And then I will manually kill any pods that won’t be forced drained. This will put the node in a “cordoned” state and a new node should soon after spin up (and if you’re lucky and the node is launched in the same AZ as your volume mounts), then the pods that were drained will be spun up on the new node.

Read more comments on GitHub >

github_iconTop Results From Across the Web

[enhancement] Running a Prefect flow results in pod evictions ...
We have a prefect-agent pod running and have registered a few flows with ... Then the conda-store and hub pods get evicted and...
Read more >
Understanding Kubernetes Evicted Pods - Sysdig
What does it mean that Kubernetes Pods are evicted? They are terminated, usually due to a lack of resources. But, why does this...
Read more >
Kubernetes pods evicted: understanding why! - Padok
Have you ever had pods evicted without understanding why? Discover what exactly is the eviction process and why your pods may be evicted!...
Read more >
A guide to Kubernetes pod eviction - Opensource.com
Pods are evicted according to the resource, like memory or disk space, causing the node pressure. The first pods to be evicted are...
Read more >
Kubernetes Pod Evictions | Troubleshooting and Examples
The evicted container is scheduled based on the node resources that have been assigned to it. Different rules govern eviction and scheduling.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found