Conda-store pods get evicted on AWS
See original GitHub issueDescribe the bug
Seems like conda-store pods get evicted on AWS on a fresh deployment. This is tested with latest main commit: https://github.com/Quansight/qhub/commit/e7992115abfe65fd429999d5a4241e4863b2a85d
Describe on the pod:
│ Events: │
│ Type Reason Age From Message │
│ ---- ------ ---- ---- ------- │
│ Warning FailedScheduling 51s (x2 over 52s) default-scheduler 0/3 nodes are available: 1 node(s) had taint {node.kubernetes.io/disk-pressure: }, that the pod didn't tolerate, 2 node(s) didn't match Pod's node affinity. │
│ Normal TriggeredScaleUp 41s cluster-autoscaler pod triggered scale-up: [{eks-20bd5579-b270-ddc9-c256-f021f1d7978b 1->2 (max: 5)}] │
│ Warning FailedScheduling 6s (x2 over 6s) default-scheduler 0/4 nodes are available: 1 node(s) had taint {node.kubernetes.io/disk-pressure: }, that the pod didn't tolerate, 1 node(s) had taint {node.kubernetes.io/not-ready: }, │
│ that the pod didn't tolerate, 2 node(s) didn't match Pod's node affinity. │
│
Issue Analytics
- State:
- Created 2 years ago
- Comments:8 (8 by maintainers)
Top Results From Across the Web
[enhancement] Running a Prefect flow results in pod evictions ...
We have a prefect-agent pod running and have registered a few flows with ... Then the conda-store and hub pods get evicted and...
Read more >Understanding Kubernetes Evicted Pods - Sysdig
What does it mean that Kubernetes Pods are evicted? They are terminated, usually due to a lack of resources. But, why does this...
Read more >Kubernetes pods evicted: understanding why! - Padok
Have you ever had pods evicted without understanding why? Discover what exactly is the eviction process and why your pods may be evicted!...
Read more >A guide to Kubernetes pod eviction - Opensource.com
Pods are evicted according to the resource, like memory or disk space, causing the node pressure. The first pods to be evicted are...
Read more >Kubernetes Pod Evictions | Troubleshooting and Examples
The evicted container is scheduled based on the node resources that have been assigned to it. Different rules govern eviction and scheduling.
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
I think you would need a new deployment, if the volume node is already spun up in a conflicting zone then its very less likely that it will be moved after updating it to the latest version.
That makes sense. Using the AWS console to confirm, the Availability Zones for the
general
node that these pods were running on was inus-east-2a
whereas the 50 GB volume mounts are in `us-east-2b.To get back to a working state, I drained the
general
node:And then I will manually kill any pods that won’t be forced drained. This will put the node in a “cordoned” state and a new node should soon after spin up (and if you’re lucky and the node is launched in the same AZ as your volume mounts), then the pods that were drained will be spun up on the new node.