question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Workers stuck in ContainerCreating state

See original GitHub issue

I’m not sure how this has happened, but I have a bunch of workers (20ish) that are stuck in ContainerCreating with no IP assigned. The standard trick of just trying to delete them isn’t working (kubectl delete po). Adding --force --grace-period 0 to kubectl delete po doesn’t work either. Any suggestions as to how to get rid of these phantom pods?

This is with dask-gateway 0.5.0 on GKE

Here’s the kubectl get po output:

NAME                                                               READY   STATUS              RESTARTS   AGE     IP             NODE                                            NOMINATED NODE   READINESS GATES
dask-gateway-anaconda-scheduler-e23bd3b9461842bc90cd84972f5ed4e9   1/1     Running             0          2m19s   10.60.208.8    gke-gartner-dask-2-default-pool-0a3f0337-kp1w   <none>           <none>
dask-gateway-anaconda-worker-092560ec333a4f449b81fc4b41434d1c      0/1     ContainerCreating   0          5h34m   <none>         gke-gartner-dask-2-user-pool-829c4a5c-rxzr      <none>           <none>
dask-gateway-anaconda-worker-0d33b613dc894f68bfa39d8493783266      0/1     ContainerCreating   0          5h34m   <none>         gke-gartner-dask-2-user-pool-829c4a5c-9g8p      <none>           <none>
dask-gateway-anaconda-worker-1029f444f66842aa9ee643c599d3f4f0      0/1     ContainerCreating   0          5h34m   <none>         gke-gartner-dask-2-default-pool-0a3f0337-cl8b   <none>           <none>
dask-gateway-anaconda-worker-2fe03b1dfce24997b2c2729726f1bc31      0/1     ContainerCreating   0          5h34m   <none>         gke-gartner-dask-2-user-pool-829c4a5c-schv      <none>           <none>
dask-gateway-anaconda-worker-30fc2943215545ab947b7303149452d5      0/1     ContainerCreating   0          5h34m   <none>         gke-gartner-dask-2-user-pool-829c4a5c-3t2c      <none>           <none>
dask-gateway-anaconda-worker-332a1c83124240f1928701b4447220cd      0/1     ContainerCreating   0          5h34m   <none>         gke-gartner-dask-2-user-pool-829c4a5c-n0fc      <none>           <none>
dask-gateway-anaconda-worker-48d0669ccedc498caf8ce7e50353712d      0/1     ContainerCreating   0          5h34m   <none>         gke-gartner-dask-2-user-pool-829c4a5c-cq7b      <none>           <none>
dask-gateway-anaconda-worker-4c22c12f2db04bb9a2990b036127f5de      0/1     ContainerCreating   0          5h34m   <none>         gke-gartner-dask-2-user-pool-829c4a5c-zv8z      <none>           <none>
dask-gateway-anaconda-worker-50da68a1611040cb934f9c24520d684b      0/1     ContainerCreating   0          5h34m   <none>         gke-gartner-dask-2-user-pool-829c4a5c-w57r      <none>           <none>
dask-gateway-anaconda-worker-5696afe163dd42a0bf912077bc17f991      0/1     ContainerCreating   0          5h34m   <none>         gke-gartner-dask-2-user-pool-829c4a5c-l04h      <none>           <none>
dask-gateway-anaconda-worker-62990c1424ca4f1db00abf6361760e5b      0/1     ContainerCreating   0          5h34m   <none>         gke-gartner-dask-2-user-pool-829c4a5c-schv      <none>           <none>
dask-gateway-anaconda-worker-6937adc44b214c559ae1f6a4d50c8441      0/1     ContainerCreating   0          5h34m   <none>         gke-gartner-dask-2-user-pool-829c4a5c-qqsd      <none>           <none>
dask-gateway-anaconda-worker-70ccd6ea870c4fd593d1723921b13226      0/1     ContainerCreating   0          5h34m   <none>         gke-gartner-dask-2-user-pool-829c4a5c-w57r      <none>           <none>
dask-gateway-anaconda-worker-72951189bb784fbd9b1b7770caac0a49      0/1     ContainerCreating   0          5h34m   <none>         gke-gartner-dask-2-user-pool-829c4a5c-cz3v      <none>           <none>
dask-gateway-anaconda-worker-781749a34a724ae5b791fb80a51f1f97      0/1     ContainerCreating   0          5h34m   <none>         gke-gartner-dask-2-user-pool-829c4a5c-mpww      <none>           <none>
dask-gateway-anaconda-worker-815a3050b1fd41cf98b90b64b3f25217      0/1     ContainerCreating   0          5h34m   <none>         gke-gartner-dask-2-default-pool-0a3f0337-cl8b   <none>           <none>
dask-gateway-anaconda-worker-854cf1edee9d4dd691e2ca9d1380086a      0/1     ContainerCreating   0          5h34m   <none>         gke-gartner-dask-2-user-pool-829c4a5c-n0fc      <none>           <none>
dask-gateway-anaconda-worker-8e1045f6ef8c4c7baa3998069d8749b5      0/1     ContainerCreating   0          5h34m   <none>         gke-gartner-dask-2-user-pool-829c4a5c-18lp      <none>           <none>
dask-gateway-anaconda-worker-8e43e9c13e1a475299f2610f9a7e481c      0/1     ContainerCreating   0          5h34m   <none>         gke-gartner-dask-2-user-pool-829c4a5c-9g8p      <none>           <none>
dask-gateway-anaconda-worker-90805c3872024b0e98bc141a3a8f8270      0/1     ContainerCreating   0          5h34m   <none>         gke-gartner-dask-2-user-pool-829c4a5c-w57r      <none>           <none>
dask-gateway-anaconda-worker-9438089332cb45ce98b5077d50c8d31a      0/1     ContainerCreating   0          5h34m   <none>         gke-gartner-dask-2-user-pool-829c4a5c-bmv1      <none>           <none>
dask-gateway-anaconda-worker-9c1e3170e0d04e85b402c64c2a270466      0/1     ContainerCreating   0          5h34m   <none>         gke-gartner-dask-2-user-pool-829c4a5c-3t2c      <none>           <none>
dask-gateway-anaconda-worker-a48305d3a9984fc399f41d67ce0c733e      0/1     ContainerCreating   0          5h34m   <none>         gke-gartner-dask-2-user-pool-829c4a5c-v9q2      <none>           <none>
dask-gateway-anaconda-worker-b23599394f7443e9b409d6067fb62f29      0/1     ContainerCreating   0          5h34m   <none>         gke-gartner-dask-2-user-pool-829c4a5c-n0fc      <none>           <none>
dask-gateway-anaconda-worker-b2b89841256c45bbb23dd4f2c8a2f78b      0/1     ContainerCreating   0          5h34m   <none>         gke-gartner-dask-2-default-pool-0a3f0337-n4wx   <none>           <none>
dask-gateway-anaconda-worker-bf97f8a1a27940eabe14c23cbff2e1a4      0/1     ContainerCreating   0          5h34m   <none>         gke-gartner-dask-2-user-pool-829c4a5c-schv      <none>           <none>
dask-gateway-anaconda-worker-c0ff5c10146242629633502bdb10136c      0/1     ContainerCreating   0          5h34m   <none>         gke-gartner-dask-2-default-pool-0a3f0337-2lbp   <none>           <none>
dask-gateway-anaconda-worker-c4a3eccc12794cc8b28cd82b564492b2      0/1     ContainerCreating   0          5h34m   <none>         gke-gartner-dask-2-default-pool-0a3f0337-cl8b   <none>           <none>
dask-gateway-anaconda-worker-da7a7c9c535a4041a7e2be972ce1b1a1      0/1     ContainerCreating   0          5h34m   <none>         gke-gartner-dask-2-user-pool-829c4a5c-bmv1      <none>           <none>
dask-gateway-anaconda-worker-eadd4a757ac54c3b8856fdece6c2158c      0/1     ContainerCreating   0          5h34m   <none>         gke-gartner-dask-2-user-pool-829c4a5c-l04h      <none>           <none>
dask-gateway-anaconda-worker-f4eeea877fa4413c8c4cadeffe5382c9      0/1     ContainerCreating   0          5h34m   <none>         gke-gartner-dask-2-user-pool-829c4a5c-v9q2      <none>           <none>
gateway-dask-gateway-66565d4c7d-zf2f9                              1/1     Running             0          21h     10.60.173.12   gke-gartner-dask-2-user-pool-829c4a5c-bmv1      <none>           <none>
scheduler-proxy-dask-gateway-69db6d9bbf-z99bw                      1/1     Running             0          20h     10.60.173.19   gke-gartner-dask-2-user-pool-829c4a5c-bmv1      <none>           <none>
web-proxy-dask-gateway-69769d57d9-lngkm                            1/1     Running             0          22h     10.60.0.185    gke-gartner-dask-2-default-pool-0a3f0337-26wz   <none>           <none>

kubectl delete po fails:

kubectl get po -n dask-gateway | grep worker | cut -d " " -f 1 | xargs kubectl delete po
Error from server (NotFound): pods "dask-gateway-anaconda-worker-092560ec333a4f449b81fc4b41434d1c" not found
Error from server (NotFound): pods "dask-gateway-anaconda-worker-0d33b613dc894f68bfa39d8493783266" not found
Error from server (NotFound): pods "dask-gateway-anaconda-worker-1029f444f66842aa9ee643c599d3f4f0" not found
Error from server (NotFound): pods "dask-gateway-anaconda-worker-2fe03b1dfce24997b2c2729726f1bc31" not found
Error from server (NotFound): pods "dask-gateway-anaconda-worker-30fc2943215545ab947b7303149452d5" not found
Error from server (NotFound): pods "dask-gateway-anaconda-worker-332a1c83124240f1928701b4447220cd" not found
Error from server (NotFound): pods "dask-gateway-anaconda-worker-48d0669ccedc498caf8ce7e50353712d" not found
Error from server (NotFound): pods "dask-gateway-anaconda-worker-4c22c12f2db04bb9a2990b036127f5de" not found
Error from server (NotFound): pods "dask-gateway-anaconda-worker-50da68a1611040cb934f9c24520d684b" not found
Error from server (NotFound): pods "dask-gateway-anaconda-worker-5696afe163dd42a0bf912077bc17f991" not found
Error from server (NotFound): pods "dask-gateway-anaconda-worker-62990c1424ca4f1db00abf6361760e5b" not found
Error from server (NotFound): pods "dask-gateway-anaconda-worker-6937adc44b214c559ae1f6a4d50c8441" not found
Error from server (NotFound): pods "dask-gateway-anaconda-worker-70ccd6ea870c4fd593d1723921b13226" not found
Error from server (NotFound): pods "dask-gateway-anaconda-worker-72951189bb784fbd9b1b7770caac0a49" not found
Error from server (NotFound): pods "dask-gateway-anaconda-worker-781749a34a724ae5b791fb80a51f1f97" not found
Error from server (NotFound): pods "dask-gateway-anaconda-worker-815a3050b1fd41cf98b90b64b3f25217" not found
Error from server (NotFound): pods "dask-gateway-anaconda-worker-854cf1edee9d4dd691e2ca9d1380086a" not found
Error from server (NotFound): pods "dask-gateway-anaconda-worker-8e1045f6ef8c4c7baa3998069d8749b5" not found
Error from server (NotFound): pods "dask-gateway-anaconda-worker-8e43e9c13e1a475299f2610f9a7e481c" not found
Error from server (NotFound): pods "dask-gateway-anaconda-worker-90805c3872024b0e98bc141a3a8f8270" not found
Error from server (NotFound): pods "dask-gateway-anaconda-worker-9438089332cb45ce98b5077d50c8d31a" not found
Error from server (NotFound): pods "dask-gateway-anaconda-worker-9c1e3170e0d04e85b402c64c2a270466" not found
Error from server (NotFound): pods "dask-gateway-anaconda-worker-a48305d3a9984fc399f41d67ce0c733e" not found
Error from server (NotFound): pods "dask-gateway-anaconda-worker-b23599394f7443e9b409d6067fb62f29" not found
Error from server (NotFound): pods "dask-gateway-anaconda-worker-b2b89841256c45bbb23dd4f2c8a2f78b" not found
Error from server (NotFound): pods "dask-gateway-anaconda-worker-bf97f8a1a27940eabe14c23cbff2e1a4" not found
Error from server (NotFound): pods "dask-gateway-anaconda-worker-c0ff5c10146242629633502bdb10136c" not found
Error from server (NotFound): pods "dask-gateway-anaconda-worker-c4a3eccc12794cc8b28cd82b564492b2" not found
Error from server (NotFound): pods "dask-gateway-anaconda-worker-da7a7c9c535a4041a7e2be972ce1b1a1" not found
Error from server (NotFound): pods "dask-gateway-anaconda-worker-eadd4a757ac54c3b8856fdece6c2158c" not found
Error from server (NotFound): pods "dask-gateway-anaconda-worker-f4eeea877fa4413c8c4cadeffe5382c9" not found

being extra aggressive about the delete doesn’t do anything either.

kubectl get po -n dask-gateway | grep worker | cut -d " " -f 1 | xargs kubectl delete --force --grace-period 0 po
warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely.
Error from server (NotFound): pods "dask-gateway-anaconda-worker-092560ec333a4f449b81fc4b41434d1c" not found
Error from server (NotFound): pods "dask-gateway-anaconda-worker-0d33b613dc894f68bfa39d8493783266" not found
Error from server (NotFound): pods "dask-gateway-anaconda-worker-1029f444f66842aa9ee643c599d3f4f0" not found
Error from server (NotFound): pods "dask-gateway-anaconda-worker-2fe03b1dfce24997b2c2729726f1bc31" not found
Error from server (NotFound): pods "dask-gateway-anaconda-worker-30fc2943215545ab947b7303149452d5" not found
Error from server (NotFound): pods "dask-gateway-anaconda-worker-332a1c83124240f1928701b4447220cd" not found
Error from server (NotFound): pods "dask-gateway-anaconda-worker-48d0669ccedc498caf8ce7e50353712d" not found
Error from server (NotFound): pods "dask-gateway-anaconda-worker-4c22c12f2db04bb9a2990b036127f5de" not found
Error from server (NotFound): pods "dask-gateway-anaconda-worker-50da68a1611040cb934f9c24520d684b" not found
Error from server (NotFound): pods "dask-gateway-anaconda-worker-5696afe163dd42a0bf912077bc17f991" not found
Error from server (NotFound): pods "dask-gateway-anaconda-worker-62990c1424ca4f1db00abf6361760e5b" not found
Error from server (NotFound): pods "dask-gateway-anaconda-worker-6937adc44b214c559ae1f6a4d50c8441" not found
Error from server (NotFound): pods "dask-gateway-anaconda-worker-70ccd6ea870c4fd593d1723921b13226" not found
Error from server (NotFound): pods "dask-gateway-anaconda-worker-72951189bb784fbd9b1b7770caac0a49" not found
Error from server (NotFound): pods "dask-gateway-anaconda-worker-781749a34a724ae5b791fb80a51f1f97" not found
Error from server (NotFound): pods "dask-gateway-anaconda-worker-815a3050b1fd41cf98b90b64b3f25217" not found
Error from server (NotFound): pods "dask-gateway-anaconda-worker-854cf1edee9d4dd691e2ca9d1380086a" not found
Error from server (NotFound): pods "dask-gateway-anaconda-worker-8e1045f6ef8c4c7baa3998069d8749b5" not found
Error from server (NotFound): pods "dask-gateway-anaconda-worker-8e43e9c13e1a475299f2610f9a7e481c" not found
Error from server (NotFound): pods "dask-gateway-anaconda-worker-90805c3872024b0e98bc141a3a8f8270" not found
Error from server (NotFound): pods "dask-gateway-anaconda-worker-9438089332cb45ce98b5077d50c8d31a" not found
Error from server (NotFound): pods "dask-gateway-anaconda-worker-9c1e3170e0d04e85b402c64c2a270466" not found
Error from server (NotFound): pods "dask-gateway-anaconda-worker-a48305d3a9984fc399f41d67ce0c733e" not found
Error from server (NotFound): pods "dask-gateway-anaconda-worker-b23599394f7443e9b409d6067fb62f29" not found
Error from server (NotFound): pods "dask-gateway-anaconda-worker-b2b89841256c45bbb23dd4f2c8a2f78b" not found
Error from server (NotFound): pods "dask-gateway-anaconda-worker-bf97f8a1a27940eabe14c23cbff2e1a4" not found
Error from server (NotFound): pods "dask-gateway-anaconda-worker-c0ff5c10146242629633502bdb10136c" not found
Error from server (NotFound): pods "dask-gateway-anaconda-worker-c4a3eccc12794cc8b28cd82b564492b2" not found
Error from server (NotFound): pods "dask-gateway-anaconda-worker-da7a7c9c535a4041a7e2be972ce1b1a1" not found
Error from server (NotFound): pods "dask-gateway-anaconda-worker-eadd4a757ac54c3b8856fdece6c2158c" not found
Error from server (NotFound): pods "dask-gateway-anaconda-worker-f4eeea877fa4413c8c4cadeffe5382c9" not found

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:6 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
ericdillcommented, Oct 8, 2019

if i find a way to reproduce this then i’ll update the bug report.

0reactions
CrispyCraftercommented, Jul 3, 2020

As for why these pods are lingering, I’m not sure. The pods are older than your current gateway instance, which is interesting. With your configuration the gateway should clean up all resources before shutting down.

I could reproduce this state with the following cluster_options:

def options_handler(options):
    return {
        "worker_cores": options.worker_cores,
        "worker_memory": options.worker_memory,
    }


c.Backend.cluster_options = Options(
    Int("worker_cores", default=1, min=1, max=4, label="Worker Cores"),
    Float("worker_memory", default=1, min=1, max=8, label="Worker Memory (GiB)"),
    handler=options_handler,
)

i.e. worker memory config was tiny

Read more comments on GitHub >

github_iconTop Results From Across the Web

Kubernetes stuck on ContainerCreating - Server Fault
It happens when you are using secrets and they are not found (like a typo in the yaml or you forgot to create...
Read more >
Pods get stuck at ContainerCreating state #8323 - GitHub
We are running linkerd stable-2.11.1 on Azure AKS. In todays instance of this issue, I was working on migrating pods to a new...
Read more >
Kubernetes pod is stuck in ContainerCreating state after ...
failed to sync usually means the pods can't be fit into any of the workers (maybe adding more will help) or from your...
Read more >
Learn why your EKS pod is stuck in the ContainerCreating state
My Amazon Elastic Kubernetes Service (Amazon EKS) pod is stuck in the ContainerCreating state with the error "failed to create pod sandbox".
Read more >
Troubleshooting in Kubernetes: A Strategic Guide
Pods stuck in ContainerCreating state ... ContainerCreating , for instance, implies that kube-scheduler has assigned a worker node for the ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found