question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

db pod does not reschedule on non-tainted node

See original GitHub issue

Environment info NooBaa Version: master-20210802 Platform: OCP 4.7..4 Actual behavior DB pod does not get scheduled on non-tainted node rather stays in terminating state on tainted node Expected behavior Db pod should get scheduled on non-tainted node

Steps to reproduce

Created PVC gpfs-vol-pvc-31
Created namespacestore using command:
noobaa namespacestore create nsfs fs2 --pvc-name='gpfs-vol-pvc-31' --fs-backend='GPFS'```
Currently, the pods are sheduled as below

```[root@api.osculate.cp.fyre.ibm.com ~]# oc get pod -o wide
NAME                                               READY   STATUS        RESTARTS   AGE   IP              NODE                               NOMINATED NODE   READINESS GATES
noobaa-core-0                                      1/1     Running       0          25m   10.254.17.153   worker1.osculate.cp.fyre.ibm.com   <none>           <none>
noobaa-db-pg-0                                     1/1     Running       0          45m   10.254.17.123   worker1.osculate.cp.fyre.ibm.com   <none>           <none>
noobaa-default-backing-store-noobaa-pod-cf4b02ee   0/1     Terminating   0          8s    <none>          worker2.osculate.cp.fyre.ibm.com   <none>           <none>
noobaa-endpoint-b67f8c458-wdgbw                    1/1     Running       0          25m   10.254.17.157   worker1.osculate.cp.fyre.ibm.com   <none>           <none>
noobaa-operator-7bb746749d-bd4sz                   1/1     Running       1          25m   10.254.17.145   worker1.osculate.cp.fyre.ibm.com   <none>           <none>

Taint Node 1 using:

kubectl taint nodes worker1.osculate.cp.fyre.ibm.com key1=value1:NoExecute

Now DB pod will come in terminating state on Node 1 only and will not get rescheduled on other node

NAME                               READY   STATUS              RESTARTS   AGE   IP              NODE                               NOMINATED NODE   READINESS GATES
noobaa-core-0                      1/1     Running             0          44s   10.254.21.162   worker2.osculate.cp.fyre.ibm.com   <none>           <none>
noobaa-db-pg-0                     0/1     Terminating         0          48m   10.254.17.123   worker1.osculate.cp.fyre.ibm.com   <none>           <none>
noobaa-endpoint-b67f8c458-gw7qm    0/1     ContainerCreating   0          83s   <none>          worker2.osculate.cp.fyre.ibm.com   <none>           <none>
noobaa-endpoint-b67f8c458-wdgbw    0/1     Terminating         0          28m   10.254.17.157   worker1.osculate.cp.fyre.ibm.com   <none>           <none>
noobaa-operator-7bb746749d-2jj88   1/1     Running             0          82s   10.254.21.150   worker2.osculate.cp.fyre.ibm.com   <none>           <none>

Note: As soon as we untaint node 1, DB pod will be in running state on Node 1 only

node/worker1.osculate.cp.fyre.ibm.com untainted
[root@api.osculate.cp.fyre.ibm.com ~]# oc get pod -o wide
NAME                                               READY   STATUS        RESTARTS   AGE     IP              NODE                               NOMINATED NODE   READINESS GATES
noobaa-core-0                                      1/1     Running       0          5m2s    10.254.21.162   worker2.osculate.cp.fyre.ibm.com   <none>           <none>
noobaa-db-pg-0                                     1/1     Running       0          52s     10.254.17.160   worker1.osculate.cp.fyre.ibm.com   <none>           <none>
noobaa-default-backing-store-noobaa-pod-cf4b02ee   0/1     Terminating   0          1s      <none>          worker1.osculate.cp.fyre.ibm.com   <none>           <none>
noobaa-endpoint-b67f8c458-gw7qm                    1/1     Running       0          5m41s   10.254.21.164   worker2.osculate.cp.fyre.ibm.com   <none>           <none>
noobaa-operator-7bb746749d-2jj88                   1/1     Running       0          5m40s   10.254.21.150   worker2.osculate.cp.fyre.ibm.com   <none>           <none>

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:16 (8 by maintainers)

github_iconTop GitHub Comments

1reaction
ketankhurana64commented, Aug 17, 2021

@nimrod-becker can you please add nsfs tag with it, i couldn’t achieve it while raising the defect

1reaction
ketankhurana64commented, Aug 6, 2021

I’ve the corresponding operator code installed

INFO[0001] noobaa-image: noobaa/noobaa-core:master-20210802
INFO[0001] operator-image: noobaa/noobaa-operator:master-20210802
INFO[0001] noobaa-db-image: centos/postgresql-12-centos7
Read more comments on GitHub >

github_iconTop Results From Across the Web

How to reschedule pod on another node if node fails? How to ...
The thing is: when a node stops reporting, there are some timeouts to mark it not ready and to, later, evict pods there....
Read more >
Pods are not rescheduled when node becomes NotReady
Issue. During failover testing, bringing a node offline does not cause a pod that was scheduled on that node to reschedule on another...
Read more >
Pods are not moved when Node in NotReady state #55713
After restarting the cluster, Kubernetes API is reporting wrong POD status. As you can see all Nodes are offline (kubelet and docker are...
Read more >
How to Debug Kubernetes Pending Pods and Scheduling ...
Learn how to debug Pending pods that fail to get scheduled due to resource constraints, taints, affinity rules, and other reasons.
Read more >
Debugging k8s on Azure: Forcing a reschedule, cordoning ...
In summary: When an event occurs that requires a pod to rescheduled and the scheduler chooses a different node in the cluster, you...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found