Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Mounting several volumes at once fails very frequently

See original GitHub issue

We’re encountering problems when trying to mount multiple EFS volumes at once. The mount process gets stuck, when trying to debug RPC there are occassional nfs: server 127.0.0.1 not responding, timed out errors in the log (not sure if those are related – the mount.efs should retry AFAIK). The stunnel processes serving the mount RPC connections seem to be just waiting for for connection but nothing happens.

This problem has been observed only with Centos 8 (or Centos Streams 8) running stunnel-5.56-5.el8_3 and openssl-libs-1.1.1k-5.el8_5. When trying Amazon Linux 2 with stunnel-4.56-6.amzn2.0.3 and openssl-libs-1.0.2k-19.amzn2.2.0.10 everything works OK. I suspected this is a race in stunnel, so I’ve tried to recompile stunnel-5.56-5 and install it on Amazon Linux 2 but the issue is again not reproducible, so it’s not stunnel (or not stunnel itself).

The issue seems to be also quite timing-sensitive. Increasing log level or changing stunnel options seems to have effect on probability of the problem to show. I’ve tried to remove PID file creation (since the issue #112 looked quite similar) but it doesn’t seem to help – I can still see the pending mounts. I also suspected the issue #105, but even if I fixed that (I hope – PR #119) the mounts still get stuck.

I wonder if the problem in issue #114 is related: we’re mostly encountering this problem through efs-csi-driver on kubernetes clusters when creating and removing multiple EFS volumes in the cluster in one shot.

I’m curious if somebody had more insight or encountered the problem: it looks like it’s the combination of multiple factors that cause this and I failed to find any interesting debugging clues.

Issue Analytics

State:
Created a year ago
Comments:14 (12 by maintainers)

Top GitHub Comments

1reaction

Cappuccinuocommented, Mar 24, 2022

Thanks. kvifern@ provided a workaround that launch a unit to monitor the file system in https://github.com/kubernetes-sigs/aws-efs-csi-driver/issues/616#issuecomment-1072965716.

Killing the stunnel process and watchdog will relaunch a new stunnel which will reconnect to server. You can try that and verify if that is working. Meanwhile we are looking this kind of issue right now actively.

0reactions

RyanStancommented, Dec 16, 2022

Closing as we’ve resolved this issue with the v1.34.4 release

Top Results From Across the Web

Volumes can get mounted multiple times, preventing pod ...

Trying to delete pod that's using multiple EFS volumes fails very frequently due to failure to unmount the EFS volumes.

Troubleshooting Kubernetes FailedAttachVolume and ...

When working with Persistent Volumes in Kubernetes, you might run into the FailedAttachVolume or FailedMount error. In this tutorial, we'll show you how...

Unable to mount volumes for pod because "volume is already ...

A common Kubernetes error is being unable to mount volumes for pod because "volume is already exclusively attached to one node and can't...

Troubleshoot volume errors - Docker Documentation

This topic discusses errors which may occur when you use Docker volumes or bind mounts. Error: Unable to remove filesystem. Some container-based utilities, ......

Shared PersistentVolumeClaim with 2 mount paths not working

It´s mentioned here that it might not work because the problem was that both volume mounts had overlapping mountPaths, i.e. both started with...