Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Backend SC caused DB pod restarts. It never came to running state

See original GitHub issue

Environment info

NooBaa Version: master-20210627
Platform: OCP 4.6.16

Actual behavior

Upgrade to master-20210627 caused db pod crash

Expected behavior

1.DB pod shouldn’t crash

Steps to reproduce

Old code - master-20210622
Upgraded to master-20210627 in order to retain accounts and buckets

[root@ocp-akshat-1-inf akshat]# oc logs noobaa-db-pg-0     |more
pg_ctl: another server might be running; trying to start server anyway
waiting for server to start....2021-06-28 05:27:42.521 UTC [25] PANIC:  could not read file "global/pg_control": Input/output error
 stopped waiting
pg_ctl: could not start server
Examine the log output.
[root@ocp-akshat-1-inf akshat]# oc logs noobaa-db-pg-0     |less -R
[root@ocp-akshat-1-inf akshat]# podn
NAME                                               READY   STATUS        RESTARTS   AGE
noobaa-core-0                                      1/1     Running       0          22m
noobaa-db-pg-0                                     0/1     Error         2          22m
noobaa-default-backing-store-noobaa-pod-b0a5d78b   0/1     Terminating   0          4d23h
noobaa-endpoint-6886745f66-rdd4m                   1/1     Running       0          22m
noobaa-operator-57d449689c-zb56f                   1/1     Running       0          22m
[root@ocp-akshat-1-inf akshat]# oc logs noobaa-db-pg-0   -p  |less -R
[root@ocp-akshat-1-inf akshat]# podn
NAME                                               READY   STATUS             RESTARTS   AGE
noobaa-core-0                                      1/1     Running            0          22m
noobaa-db-pg-0                                     0/1     CrashLoopBackOff   2          22m
noobaa-default-backing-store-noobaa-pod-b0a5d78b   0/1     Terminating        0          4d23h
noobaa-endpoint-6886745f66-rdd4m                   1/1     Running            0          23m
noobaa-operator-57d449689c-zb56f                   1/1     Running            0          23m

QoS Class:       Guaranteed
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason          Age                    From               Message
  ----     ------          ----                   ----               -------
  Normal   Scheduled       21m                    default-scheduler  Successfully assigned noobaa/noobaa-db-pg-0 to worker2.ocp-akshat-1.cp.fyre.ibm.com
  Normal   AddedInterface  21m                    multus             Add eth0 [10.254.17.98/22]
  Normal   Pulling         21m                    kubelet            Pulling image "noobaa/noobaa-core:master-20210627"
  Normal   Pulled          20m                    kubelet            Successfully pulled image "noobaa/noobaa-core:master-20210627" in 30.126640313s
  Normal   Created         20m                    kubelet            Created container init
  Normal   Started         20m                    kubelet            Started container init
  Warning  Failed          2m34s (x4 over 3m14s)  kubelet            Error: failed to resolve symlink "/var/lib/kubelet/pods/ce44b338-0155-430c-97d7-5408c230e0b4/volumes/kubernetes.io~csi/pvc-d1c22d45-5f3b-4684-8f4c-48880815f451/mount": lstat /var/mnt/fs1: stale NFS file handle
  Normal   Pulled          104s (x6 over 20m)     kubelet            Container image "centos/postgresql-12-centos7" already present on machine
  Normal   Created         103s (x2 over 20m)     kubelet            Created container db
  Normal   Started         103s (x2 over 20m)     kubelet            Started container db
  Warning  BackOff         11s (x11 over 2m21s)   kubelet            Back-off restarting failed container

More information - Screenshots / Logs / Other output

Issue Analytics

State:
Created 2 years ago
Comments:26 (9 by maintainers)

Top GitHub Comments

1reaction

nehasharma5commented, Jul 5, 2021

@nimrod-becker Here is the list of pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE db-noobaa-db-pg-0 Bound pvc-0f3ae11a-971c-4480-9398-d3f37fb145a8 50Gi RWO ibm-spectrum-scale-csi-fileset 6d21h gpfs-vol-pvc-new1 Bound gpfs-pv-3 250Gi RWX 4d20h gpfs-vol-pvc-new11 Bound gpfs-pv-31 250Gi RWX 3d16h noobaa-default-backing-store-noobaa-pvc-1ff51808 Bound pvc-c164a9ac-1855-4788-8219-a2f2ab8ce831 50Gi RWO ibm-spectrum-scale-csi-fileset 6d21h [root@api.ns.cp.fyre.ibm.com ~]#

gpfs-vol-pvc-new1 is used for endpoint pod

0reactions

akmithalcommented, Aug 16, 2021

We can close this bug based on above finding.

Summary: Noobaa DB pod requires backend NSD’s to be up. So far due to Fyre environment, all NSD’s were not coming up hence Noobaa db pod stayed in Crashed loop. When manually the NSD’s were brought up, the Noobaa db pod came in Running state and IO could continue.