question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Backend SC caused DB pod restarts. It never came to running state

See original GitHub issue

Environment info

  • NooBaa Version: master-20210627
  • Platform: OCP 4.6.16

Actual behavior

  1. Upgrade to master-20210627 caused db pod crash

Expected behavior

1.DB pod shouldn’t crash

Steps to reproduce

  1. Old code - master-20210622
  2. Upgraded to master-20210627 in order to retain accounts and buckets
[root@ocp-akshat-1-inf akshat]# oc logs noobaa-db-pg-0     |more
pg_ctl: another server might be running; trying to start server anyway
waiting for server to start....2021-06-28 05:27:42.521 UTC [25] PANIC:  could not read file "global/pg_control": Input/output error
 stopped waiting
pg_ctl: could not start server
Examine the log output.
[root@ocp-akshat-1-inf akshat]# oc logs noobaa-db-pg-0     |less -R
[root@ocp-akshat-1-inf akshat]# podn
NAME                                               READY   STATUS        RESTARTS   AGE
noobaa-core-0                                      1/1     Running       0          22m
noobaa-db-pg-0                                     0/1     Error         2          22m
noobaa-default-backing-store-noobaa-pod-b0a5d78b   0/1     Terminating   0          4d23h
noobaa-endpoint-6886745f66-rdd4m                   1/1     Running       0          22m
noobaa-operator-57d449689c-zb56f                   1/1     Running       0          22m
[root@ocp-akshat-1-inf akshat]# oc logs noobaa-db-pg-0   -p  |less -R
[root@ocp-akshat-1-inf akshat]# podn
NAME                                               READY   STATUS             RESTARTS   AGE
noobaa-core-0                                      1/1     Running            0          22m
noobaa-db-pg-0                                     0/1     CrashLoopBackOff   2          22m
noobaa-default-backing-store-noobaa-pod-b0a5d78b   0/1     Terminating        0          4d23h
noobaa-endpoint-6886745f66-rdd4m                   1/1     Running            0          23m
noobaa-operator-57d449689c-zb56f                   1/1     Running            0          23m

QoS Class:       Guaranteed
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason          Age                    From               Message
  ----     ------          ----                   ----               -------
  Normal   Scheduled       21m                    default-scheduler  Successfully assigned noobaa/noobaa-db-pg-0 to worker2.ocp-akshat-1.cp.fyre.ibm.com
  Normal   AddedInterface  21m                    multus             Add eth0 [10.254.17.98/22]
  Normal   Pulling         21m                    kubelet            Pulling image "noobaa/noobaa-core:master-20210627"
  Normal   Pulled          20m                    kubelet            Successfully pulled image "noobaa/noobaa-core:master-20210627" in 30.126640313s
  Normal   Created         20m                    kubelet            Created container init
  Normal   Started         20m                    kubelet            Started container init
  Warning  Failed          2m34s (x4 over 3m14s)  kubelet            Error: failed to resolve symlink "/var/lib/kubelet/pods/ce44b338-0155-430c-97d7-5408c230e0b4/volumes/kubernetes.io~csi/pvc-d1c22d45-5f3b-4684-8f4c-48880815f451/mount": lstat /var/mnt/fs1: stale NFS file handle
  Normal   Pulled          104s (x6 over 20m)     kubelet            Container image "centos/postgresql-12-centos7" already present on machine
  Normal   Created         103s (x2 over 20m)     kubelet            Created container db
  Normal   Started         103s (x2 over 20m)     kubelet            Started container db
  Warning  BackOff         11s (x11 over 2m21s)   kubelet            Back-off restarting failed container

More information - Screenshots / Logs / Other output

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:26 (9 by maintainers)

github_iconTop GitHub Comments

1reaction
nehasharma5commented, Jul 5, 2021

@nimrod-becker Here is the list of pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE db-noobaa-db-pg-0 Bound pvc-0f3ae11a-971c-4480-9398-d3f37fb145a8 50Gi RWO ibm-spectrum-scale-csi-fileset 6d21h gpfs-vol-pvc-new1 Bound gpfs-pv-3 250Gi RWX 4d20h gpfs-vol-pvc-new11 Bound gpfs-pv-31 250Gi RWX 3d16h noobaa-default-backing-store-noobaa-pvc-1ff51808 Bound pvc-c164a9ac-1855-4788-8219-a2f2ab8ce831 50Gi RWO ibm-spectrum-scale-csi-fileset 6d21h [root@api.ns.cp.fyre.ibm.com ~]#

gpfs-vol-pvc-new1 is used for endpoint pod

0reactions
akmithalcommented, Aug 16, 2021

We can close this bug based on above finding.

Summary: Noobaa DB pod requires backend NSD’s to be up. So far due to Fyre environment, all NSD’s were not coming up hence Noobaa db pod stayed in Crashed loop. When manually the NSD’s were brought up, the Noobaa db pod came in Running state and IO could continue.

Read more comments on GitHub >

github_iconTop Results From Across the Web

[mariadb-galera] Pods restart repeatedly due to ... - GitHub
Pod restarts repeatedly due to readinessProbe failure. It appears that the rootUser might be getting ignored as a hostname is provided and the ......
Read more >
A Pod Restarts. So, What's Going on? | by Raju Dawadi
Yes, it re-initiated a new pod and after its ready, the older one gets terminated. When there are many pods running, the get...
Read more >
Troubleshooting installation - IBM
To see the status of a particular pod, run the following command: ... pod in the ibm-common-services namespace, which will cause it to...
Read more >
Kubernetes CrashLoopBackOff: What it is, and how to fix it?
CrashLoopBackOff is a Kubernetes state representing a restart loop that is happening in a Pod: a container in the Pod is started, ...
Read more >
MongoDB Container data loss issue - A Journey
A technical write-up on how I found the cause and fix for why MongoDB ... that writes to a database back end, running...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found