db pod does not reschedule on non-tainted node
See original GitHub issueEnvironment info
NooBaa Version: master-20210802 Platform: OCP 4.7..4
Actual behavior
DB pod does not get scheduled on non-tainted node rather stays in terminating state on tainted node
Expected behavior
Db pod should get scheduled on non-tainted node
Steps to reproduce
Created PVC gpfs-vol-pvc-31
Created namespacestore using command:
noobaa namespacestore create nsfs fs2 --pvc-name='gpfs-vol-pvc-31' --fs-backend='GPFS'```
Currently, the pods are sheduled as below
```[root@api.osculate.cp.fyre.ibm.com ~]# oc get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
noobaa-core-0 1/1 Running 0 25m 10.254.17.153 worker1.osculate.cp.fyre.ibm.com <none> <none>
noobaa-db-pg-0 1/1 Running 0 45m 10.254.17.123 worker1.osculate.cp.fyre.ibm.com <none> <none>
noobaa-default-backing-store-noobaa-pod-cf4b02ee 0/1 Terminating 0 8s <none> worker2.osculate.cp.fyre.ibm.com <none> <none>
noobaa-endpoint-b67f8c458-wdgbw 1/1 Running 0 25m 10.254.17.157 worker1.osculate.cp.fyre.ibm.com <none> <none>
noobaa-operator-7bb746749d-bd4sz 1/1 Running 1 25m 10.254.17.145 worker1.osculate.cp.fyre.ibm.com <none> <none>
Taint Node 1 using:
kubectl taint nodes worker1.osculate.cp.fyre.ibm.com key1=value1:NoExecute
Now DB pod will come in terminating state on Node 1 only and will not get rescheduled on other node
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
noobaa-core-0 1/1 Running 0 44s 10.254.21.162 worker2.osculate.cp.fyre.ibm.com <none> <none>
noobaa-db-pg-0 0/1 Terminating 0 48m 10.254.17.123 worker1.osculate.cp.fyre.ibm.com <none> <none>
noobaa-endpoint-b67f8c458-gw7qm 0/1 ContainerCreating 0 83s <none> worker2.osculate.cp.fyre.ibm.com <none> <none>
noobaa-endpoint-b67f8c458-wdgbw 0/1 Terminating 0 28m 10.254.17.157 worker1.osculate.cp.fyre.ibm.com <none> <none>
noobaa-operator-7bb746749d-2jj88 1/1 Running 0 82s 10.254.21.150 worker2.osculate.cp.fyre.ibm.com <none> <none>
Note: As soon as we untaint node 1, DB pod will be in running state on Node 1 only
node/worker1.osculate.cp.fyre.ibm.com untainted
[root@api.osculate.cp.fyre.ibm.com ~]# oc get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
noobaa-core-0 1/1 Running 0 5m2s 10.254.21.162 worker2.osculate.cp.fyre.ibm.com <none> <none>
noobaa-db-pg-0 1/1 Running 0 52s 10.254.17.160 worker1.osculate.cp.fyre.ibm.com <none> <none>
noobaa-default-backing-store-noobaa-pod-cf4b02ee 0/1 Terminating 0 1s <none> worker1.osculate.cp.fyre.ibm.com <none> <none>
noobaa-endpoint-b67f8c458-gw7qm 1/1 Running 0 5m41s 10.254.21.164 worker2.osculate.cp.fyre.ibm.com <none> <none>
noobaa-operator-7bb746749d-2jj88 1/1 Running 0 5m40s 10.254.21.150 worker2.osculate.cp.fyre.ibm.com <none> <none>
Issue Analytics
- State:
- Created 2 years ago
- Comments:16 (8 by maintainers)
Top Results From Across the Web
How to reschedule pod on another node if node fails? How to ...
The thing is: when a node stops reporting, there are some timeouts to mark it not ready and to, later, evict pods there....
Read more >Pods are not rescheduled when node becomes NotReady
Issue. During failover testing, bringing a node offline does not cause a pod that was scheduled on that node to reschedule on another...
Read more >Pods are not moved when Node in NotReady state #55713
After restarting the cluster, Kubernetes API is reporting wrong POD status. As you can see all Nodes are offline (kubelet and docker are...
Read more >How to Debug Kubernetes Pending Pods and Scheduling ...
Learn how to debug Pending pods that fail to get scheduled due to resource constraints, taints, affinity rules, and other reasons.
Read more >Debugging k8s on Azure: Forcing a reschedule, cordoning ...
In summary: When an event occurs that requires a pod to rescheduled and the scheduler chooses a different node in the cluster, you...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found

@nimrod-becker can you please add nsfs tag with it, i couldn’t achieve it while raising the defect
I’ve the corresponding operator code installed