question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Operator pod shows panic and restarted when shutting down the node on which endpoint is scheduled

See original GitHub issue

Environment info

[root@api.ns.cp.fyre.ibm.com ~]# oc version Client Version: 4.7.13 Server Version: 4.7.13 Kubernetes Version: v1.20.0+df9c838 [root@api.ns.cp.fyre.ibm.com ~]# noobaa version INFO[0000] CLI version: 5.9.0 INFO[0000] noobaa-image: noobaa/noobaa-core:master-20210719 INFO[0000] operator-image: noobaa/noobaa-operator:5.9.0 [root@api.ns.cp.fyre.ibm.com ~]#

Actual behavior

  1. Operator pod shows panic and restarted when shutting down the node on which endpoint is scheduled

Expected behavior

  1. No panic should be shown in operator logs and operator pod should not have restarted

Steps to reproduce

  1. Install noobaa and start a copy object operation into a bucket
  2. While doing copy operation shutdown the node on which noobaa is installed (I have only endpoint pod scheduled on that node , no other noobaa pode)
  3. Start the node

Inf node: [root@api.ns.cp.fyre.ibm.com ~]# oc get pod -o wide

NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
noobaa-core-0 1/1 Running 0 73m 10.254.3.167 master1.ns.cp.fyre.ibm.com
noobaa-db-pg-0 1/1 Running 0 60m 10.254.4.17 master0.ns.cp.fyre.ibm.com
noobaa-default-backing-store-noobaa-pod-62daf8d7 0/1 Terminating 0 38m master2.ns.cp.fyre.ibm.com
noobaa-endpoint-565dbbd667-gfzt2 1/1 Running 0 74m 10.254.4.14 master0.ns.cp.fyre.ibm.com
noobaa-operator-6d54447bc5-hr7sb 1/1 Running 1 19h 10.254.3.136 master1.ns.cp.fyre.ibm.com

[root@api.ns.cp.fyre.ibm.com ~]#

More information - Screenshots / Logs / Other output

operator.log must-gather.local.2716179569581607829.tar.gz

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:20 (12 by maintainers)

github_iconTop GitHub Comments

1reaction
dannyzakencommented, Aug 19, 2021

@Igor and I discussed a solution that instead of panicking immediately when encountering an unknown error, the operator will return a temp error and the reconcile will requeu. If it reoccurs several times then the operator will panic. @nimrod-becker WDYT?

1reaction
dannyzakencommented, Aug 4, 2021

AFAIU it’s not a recurring panic, and after the operator restarted it did not happen again. @nehasharma5 am I right?

if so I think we should keep the panic and not change it. the panic is there to avoid silent failures when encountering unknown errors. if we see that this specific error is repeating in many cases maybe we can ignore it specifically, but I wouldn’t remove the panic entirely. @igorpick @nimrod-becker WDYT?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Kubernetes 1.24: Introducing Non-Graceful Node Shutdown ...
Graceful Node Shutdown allows Kubernetes to detect when a node is shutting down cleanly, and handles that situation appropriately. A Node ...
Read more >
[BUG] Getting "webhook configurations error" #2179 - GitHub
We are in GKE using preemptible nodes, which means that our nodes shutdown and recycle at least 1x a day and evict any...
Read more >
RHBA-2020:2409 - Bug Fix Advisory - Red Hat Customer Portal
BZ - 1809747 - [ovn-kubernetes] When a node gets deleted, the Chassis record for that node is not deleted from the sbdb. BZ...
Read more >
Shutting down a cluster gracefully | Backup and restore
To use host binaries, run `chroot /host` Shutdown scheduled for Mon 2021-09-13 09:36:29 UTC, use 'shutdown -c' to cancel. Shutting down the nodes...
Read more >
Known issues and limitations - IBM
The workaround is to stop the kube-controller-manager leader container on the master nodes and let it restart. If high availability is configured for...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found