Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Operator pod shows panic and restarted when shutting down the node on which endpoint is scheduled

See original GitHub issue

Environment info

[root@api.ns.cp.fyre.ibm.com ~]# oc version Client Version: 4.7.13 Server Version: 4.7.13 Kubernetes Version: v1.20.0+df9c838 [root@api.ns.cp.fyre.ibm.com ~]# noobaa version INFO[0000] CLI version: 5.9.0 INFO[0000] noobaa-image: noobaa/noobaa-core:master-20210719 INFO[0000] operator-image: noobaa/noobaa-operator:5.9.0 [root@api.ns.cp.fyre.ibm.com ~]#

Actual behavior

Operator pod shows panic and restarted when shutting down the node on which endpoint is scheduled

Expected behavior

No panic should be shown in operator logs and operator pod should not have restarted

Steps to reproduce

Install noobaa and start a copy object operation into a bucket
While doing copy operation shutdown the node on which noobaa is installed (I have only endpoint pod scheduled on that node , no other noobaa pode)
Start the node

Inf node: [root@api.ns.cp.fyre.ibm.com ~]# oc get pod -o wide

NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
noobaa-core-0 1/1 Running 0 73m 10.254.3.167 master1.ns.cp.fyre.ibm.com
noobaa-db-pg-0 1/1 Running 0 60m 10.254.4.17 master0.ns.cp.fyre.ibm.com
noobaa-default-backing-store-noobaa-pod-62daf8d7 0/1 Terminating 0 38m master2.ns.cp.fyre.ibm.com
noobaa-endpoint-565dbbd667-gfzt2 1/1 Running 0 74m 10.254.4.14 master0.ns.cp.fyre.ibm.com
noobaa-operator-6d54447bc5-hr7sb 1/1 Running 1 19h 10.254.3.136 master1.ns.cp.fyre.ibm.com

[root@api.ns.cp.fyre.ibm.com ~]#

More information - Screenshots / Logs / Other output

operator.log must-gather.local.2716179569581607829.tar.gz

Issue Analytics

State:
Created 2 years ago
Comments:20 (12 by maintainers)

Top GitHub Comments

1reaction

dannyzakencommented, Aug 19, 2021

@Igor and I discussed a solution that instead of panicking immediately when encountering an unknown error, the operator will return a temp error and the reconcile will requeu. If it reoccurs several times then the operator will panic. @nimrod-becker WDYT?

1reaction

dannyzakencommented, Aug 4, 2021

AFAIU it’s not a recurring panic, and after the operator restarted it did not happen again. @nehasharma5 am I right?

if so I think we should keep the panic and not change it. the panic is there to avoid silent failures when encountering unknown errors. if we see that this specific error is repeating in many cases maybe we can ignore it specifically, but I wouldn’t remove the panic entirely. @igorpick @nimrod-becker WDYT?

Top Results From Across the Web

Kubernetes 1.24: Introducing Non-Graceful Node Shutdown ...

Graceful Node Shutdown allows Kubernetes to detect when a node is shutting down cleanly, and handles that situation appropriately. A Node ...

[BUG] Getting "webhook configurations error" #2179 - GitHub

We are in GKE using preemptible nodes, which means that our nodes shutdown and recycle at least 1x a day and evict any...

RHBA-2020:2409 - Bug Fix Advisory - Red Hat Customer Portal

BZ - 1809747 - [ovn-kubernetes] When a node gets deleted, the Chassis record for that node is not deleted from the sbdb. BZ...

Shutting down a cluster gracefully | Backup and restore

To use host binaries, run `chroot /host` Shutdown scheduled for Mon 2021-09-13 09:36:29 UTC, use 'shutdown -c' to cancel. Shutting down the nodes...

Known issues and limitations - IBM

The workaround is to stop the kube-controller-manager leader container on the master nodes and let it restart. If high availability is configured for...