Wrong FS mounting on nooba endpoint after doing fail back test
See original GitHub issueSteps: Run Fail back test as follows:
-
Create 3 accounts with respective buckets -
Start IO on the buckets with respective users with warp tool. -
Update the service with EnableAutoHA as True. -
Shutdown the node which do not have csi-attacher and noobaa-db pods -
Wait for the noobaa pod running on that node to reach "pending" state -
Perform IO - IO should fail on noobaa endpoint for which Node is shutdown -
Edit svc of that node and move it to running node. -
Perform IO- IO works fine on noobaa endpoint for which Node is shutdown as it has been transferred on the node which is running in step 5. -
bring up the node which do not have csi-attacher and noobaa-db pods. -
Wait for the noobaa endpoint pod to reach from "pending" to "Containercreating" to reach "running" state. -
Perform IO: IO works fine here -
Edit svc of that Node whhich is brought up to trasnfer IP back to it. -
Perform IO: IO fails here (this is the bug) -
IO should reach to the nodes which was previously down.
Result: IO fails due to Wrong FS being mounted on the noobaa endpoint on editing svc to transfer IP back to it.
Refer execution steps with output in the attachment named BUG 714.txt
Check noobaa endpoint running on Node 7:
[root@hpo-app1 ~]# oc get pod -n openshift-storage -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
noobaa-core-0 1/1 Running 0 2d19h 10.128.4.37 hpo-app6.hpofvt1.tuc.stglabs.ibm.com <none> <none>
noobaa-db-pg-0 1/1 Running 1 (42h ago) 3d 10.128.2.239 hpo-app5.hpofvt1.tuc.stglabs.ibm.com <none> <none>
noobaa-default-backing-store-noobaa-pod-9a260251 1/1 Running 0 8d 10.128.3.26 hpo-app5.hpofvt1.tuc.stglabs.ibm.com <none> <none>
noobaa-endpoint-f4d6cb6f-2q2lp 1/1 Running 0 23h 10.128.0.16 hpo-app7.hpofvt1.tuc.stglabs.ibm.com <none> <none>
noobaa-endpoint-f4d6cb6f-4qclw 1/1 Running 0 3d 10.128.4.13 hpo-app6.hpofvt1.tuc.stglabs.ibm.com <none> <none>
noobaa-endpoint-f4d6cb6f-vtnf4 1/1 Running 0 8d 10.128.3.27 hpo-app5.hpofvt1.tuc.stglabs.ibm.com <none> <none>
noobaa-operator-6d9b94dbb7-75qcl 1/1 Running 1 (3d ago) 8d 10.128.3.20 hpo-app5.hpofvt1.tuc.stglabs.ibm.com <none> <none>
ocs-metrics-exporter-6c7b6c7f85-9bvdt 1/1 Running 0 2d18h 10.128.4.50 hpo-app6.hpofvt1.tuc.stglabs.ibm.com <none> <none>
ocs-operator-5dc4cdbdf7-585xr 1/1 Running 3 (23h ago) 3d 10.128.2.238 hpo-app5.hpofvt1.tuc.stglabs.ibm.com <none> <none>
odf-console-79fbf9b9f4-rpszn 1/1 Running 0 3d 10.128.2.235 hpo-app5.hpofvt1.tuc.stglabs.ibm.com <none> <none>
odf-operator-controller-manager-7bdfd6b6c-ftsb7 2/2 Running 5 (23h ago) 8d 10.128.3.21 hpo-app5.hpofvt1.tuc.stglabs.ibm.com <none> <none>
rook-ceph-operator-5468669db9-fndzf 1/1 Running 0 2d18h 10.128.4.52 hpo-app6.hpofvt1.tuc.stglabs.ibm.com <none> <none>
check file system on noobaa endpoint running on node 7 , mount point /nsfs/noobaa-s3res-4080029599 should have file system “remote-sample” but it is having “/dev/sda4” as shown below:
[root@hpo-app1 ~]# oc rsh -n openshift-storage noobaa-endpoint-f4d6cb6f-2q2lp
sh-4.4# df -h
Filesystem Size Used Avail Use% Mounted on
overlay 744G 25G 720G 4% /
tmpfs 64M 0 64M 0% /dev
tmpfs 252G 0 252G 0% /sys/fs/cgroup
shm 64M 0 64M 0% /dev/shm
tmpfs 252G 58M 252G 1% /etc/hostname
tmpfs 3.0G 8.0K 3.0G 1% /etc/mgmt-secret
tmpfs 3.0G 8.0K 3.0G 1% /etc/s3-secret
/dev/sda4 744G 25G 720G 4% /nsfs/noobaa-s3res-4080029599
tmpfs 3.0G 20K 3.0G 1% /run/secrets/kubernetes.io/serviceaccount
tmpfs 252G 0 252G 0% /proc/acpi
tmpfs 252G 0 252G 0% /proc/scsi
tmpfs 252G 0 252G 0% /sys/firmware
sh-4.4#
Required additional info:
[root@hpo-app1 ip-config]# oc get namespacestore noobaa-s3res-4080029599 -o yaml -n openshift-storage
apiVersion: noobaa.io/v1alpha1
kind: NamespaceStore
metadata:
creationTimestamp: "2022-05-27T11:30:10Z"
finalizers:
- noobaa.io/finalizer
generation: 2
labels:
app: noobaa
name: noobaa-s3res-4080029599
namespace: openshift-storage
resourceVersion: "91072159"
uid: 9a930f5f-7236-4700-b5f2-298c7b44806e
spec:
nsfs:
fsBackend: GPFS
pvcName: noobaa-s3resvol-pvc-4080029599
subPath: ""
type: nsfs
status:
conditions:
- lastHeartbeatTime: "2022-05-27T11:30:10Z"
lastTransitionTime: "2022-05-27T11:30:50Z"
message: NamespaceStorePhaseReady
reason: 'Namespace store mode: OPTIMAL'
status: "True"
type: Available
- lastHeartbeatTime: "2022-05-27T11:30:10Z"
lastTransitionTime: "2022-05-27T11:30:50Z"
message: NamespaceStorePhaseReady
reason: 'Namespace store mode: OPTIMAL'
status: "False"
type: Progressing
- lastHeartbeatTime: "2022-05-27T11:30:10Z"
lastTransitionTime: "2022-05-27T11:30:50Z"
message: NamespaceStorePhaseReady
reason: 'Namespace store mode: OPTIMAL'
status: "False"
type: Degraded
- lastHeartbeatTime: "2022-05-27T11:30:10Z"
lastTransitionTime: "2022-05-27T11:30:50Z"
message: NamespaceStorePhaseReady
reason: 'Namespace store mode: OPTIMAL'
status: "True"
type: Upgradeable
mode:
modeCode: OPTIMAL
timeStamp: 2022-05-27 11:30:50.371576833 +0000 UTC m=+6215.334443229
phase: Ready
[root@hpo-app1 ip-config]#
[root@hpo-app1 ip-config]# oc get pvc noobaa-s3resvol-pvc-4080029599 -o yaml -n openshift-storage
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
annotations:
pv.kubernetes.io/bind-completed: "yes"
creationTimestamp: "2022-05-27T11:30:10Z"
finalizers:
- kubernetes.io/pvc-protection
name: noobaa-s3resvol-pvc-4080029599
namespace: openshift-storage
resourceVersion: "91071836"
uid: 1ceebadd-6701-4339-aa3a-cee6878a86b9
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 50Gi
volumeMode: Filesystem
volumeName: noobaa-s3respv-4080029599
status:
accessModes:
- ReadWriteMany
capacity:
storage: 50Gi
phase: Bound
[root@hpo-app1 ip-config]#
[root@hpo-app1 ip-config]# oc get pv noobaa-s3respv-4080029599 -o yaml -n openshift-storage
apiVersion: v1
kind: PersistentVolume
metadata:
annotations:
pv.kubernetes.io/bound-by-controller: "yes"
creationTimestamp: "2022-05-27T11:30:10Z"
finalizers:
- kubernetes.io/pv-protection
- external-attacher/spectrumscale-csi-ibm-com
name: noobaa-s3respv-4080029599
resourceVersion: "91071841"
uid: 5df5858f-a8f0-4f21-aa9c-31f60c12d102
spec:
accessModes:
- ReadWriteMany
capacity:
storage: 50Gi
claimRef:
apiVersion: v1
kind: PersistentVolumeClaim
name: noobaa-s3resvol-pvc-4080029599
namespace: openshift-storage
resourceVersion: "91071594"
uid: 1ceebadd-6701-4339-aa3a-cee6878a86b9
csi:
driver: spectrumscale.csi.ibm.com
volumeHandle: 2495414681104167923;8F003114:620E2866;path=/mnt/remote-sample
persistentVolumeReclaimPolicy: Retain
volumeMode: Filesystem
status:
phase: Bound
[root@hpo-app1 ip-config]#
Issue Analytics
- State:
- Created a year ago
- Comments:14 (4 by maintainers)
Top Results From Across the Web
Wrong FS mounting on nooba endpoint after doing fail back test ...
Perform IO- IO works fine on noobaa endpoint for which Node is shutdown as it has been transferred on the node which is...
Read more >Deploying and managing OpenShift Container Storage using ...
Install the Red Hat OpenShift Container Storage Operator . ... The project is deleted if the following command returns a NotFound error.
Read more >Missing files/ packages, Failed in boot screen - Raspberry Pi ...
have a new pi3B. I bought the one with the NOOBS software enclosed. ( Debian 2). Tried updating and upgrading using sudo apt-get...
Read more >Untitled
Dalmatiner ausstellung tremmersdorf, Windows registry services error control! ... Domingo tibaduiza, Historia do ciep 146, Purple jet transformer.
Read more >ingress-nginx/Changelog.md at main - GitHub
9104 Fix yaml formatting error with multiple annotations ... 9078 expand CI testing for all stable versions of Kubernetes; 9074 fix: do not...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found

Closing issue as it is a csi issue as mentioned above and logged in ibm-spectrum-scale-csi
@romayalon @nimrod-becker , Could you prioritize and do RCA on this defect since it has hit few times (3 to 4) when performing the FOFB testing.