All Persistant Volumes fail permanently after NAS reboot
See original GitHub issueWhenever I reboot the OS on the NAS that hosts my ISCSI democratic-csi volumes, all containers that rely on those volumes fail consistently even after the NAS comes back online with the following error:
Warning FailedMount 37s kubelet MountVolume.MountDevice failed for volume "pvc-da280e70-9bcb-41ba-bbbd-cbf973580c6e" : rpc error: code = DeadlineExceeded desc = context deadline exceeded
Warning FailedMount 34s kubelet Unable to attach or mount volumes: unmounted volumes=[config], unattached volumes=[config media transcode kube-api-access-2c2w7 backup]: timed out waiting for the condition
Warning FailedMount 5s (x6 over 37s) kubelet MountVolume.MountDevice failed for volume "pvc-da280e70-9bcb-41ba-bbbd-cbf973580c6e" : rpc error: code = Aborted desc = operation locked due to in progress operation(s): ["volume_id_pvc-da280e70-9bcb-41ba-bbbd-cbf973580c6e"]
I have tried suspending all pods with kubectl scale -n media deploy/plex --replicas 0
to try and ensure that nothing is using the volume during the reboot.
Unfortunately I know almost nothing about ISCSI, so it’s entirely possible this is 100% my fault. What is the proper process with ISCSI for rebooting either the NAS, or the nodes using PVs on the NAS to prevent this lockup? Is there an iscsiadm
command I can use to remove this deadlock and let the new container access the PV?
my democratic-csi config is:
---
apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
name: csi-iscsi
namespace: storage
spec:
interval: 5m
chart:
spec:
chart: democratic-csi
version: 0.13.4
sourceRef:
kind: HelmRepository
name: democratic-csi-charts
namespace: flux-system
interval: 5m
values:
csiDriver:
name: "org.democratic-csi.iscsi"
storageClasses:
- name: tank-iscsi-csi
defaultClass: true
reclaimPolicy: Delete
## For testing
# reclaimPolicy: Retain
volumeBindingMode: Immediate
allowVolumeExpansion: true
parameters:
fsType: ext4
driver:
image: docker.io/democraticcsi/democratic-csi:v1.7.6
imagePullPolicy: IfNotPresent
config:
driver: zfs-generic-iscsi
existingConfigSecret: zfs-generic-iscsi-config
and the driver config is:
apiVersion: v1
kind: Secret
metadata:
name: zfs-generic-iscsi-config
namespace: storage
stringData:
driver-config-file.yaml: |
driver: zfs-generic-iscsi
sshConnection:
host: ${UIHARU_IP}
port: 22
username: root
privateKey: |
-----BEGIN OPENSSH PRIVATE KEY-----
...
-----END OPENSSH PRIVATE KEY-----
zfs:
datasetParentName: sltank/k8s/iscsiv
detachedSnapshotsDatasetParentName: sltank/k8s/iscsis
iscsi:
shareStrategy: "targetCli"
shareStrategyTargetCli:
basename: "iqn.2016-04.com.open-iscsi:a6b73d4196"
tpg:
attributes:
authentication: 0
generate_node_acls: 1
cache_dynamic_acls: 1
demo_mode_write_protect: 0
targetPortal: "${UIHARU_IP}"
Not sure what other info is important, but I’d be happy to provide anything else that might help troubleshoot the issue.
Issue Analytics
- State:
- Created a year ago
- Comments:11 (6 by maintainers)
Yeah, that’s a dangerous situation (which is why when iscsi goes down the volumes go into ro mode). 2 nodes using the same block device simultaneously is not something you want happening. I would use something like
kured
(https://github.com/weaveworks/kured) or similar to simply trigger alll your nodes to cycle so the workloads shift around and everything comes up clean.Ah this is a tricky one and I’m glad you opened this. So there are a couple issues at play here:
democratic-csi
ensures no 2 (possibly conflicting) operations happen at the same time and thus creates an in-memory lockThe first can be remedied by deleting all the
democratic-csi
pods and just letting them restart. The latter requires you to handle each workload in a case by case basis.Essentially if the nas goes down and comes back up the iscsi sessions on the node (assuming they recover) go to read-only. The only way to remedy that (via k8s) is to just restart the pods as appropriate…and even then in some cases that may not be enough and would require forcing the workload to a new node. I’ll do some research on possible ways to just go to the cli of the nodes directly and get them back into a rw state manually without any other intervention at the k8s layer.