High concurrency Fuse read stuck
See original GitHub issueAlluxio Version: 2.9.0-SNAPSHOT
Describe the bug The read applications generate 64 threads, each thread creates a file in Alluxio and keeps doing sequential read operations with different buffer sizes against the file through Alluxio POSIX API.
After the test run for more than 20 minutes, the read application stuck. 55 threads out of 64 are stuck when reading from AlluxioFuse
at java.io.FileInputStream.readBytes(Native Method)
at java.io.FileInputStream.read(FileInputStream.java:233)
at alluxio.SequentialReadTest.sequentialReadSingleFile(SequentialReadTest.java:49)
9 remaining threads are stuck when writing to AlluxioFuse
at sun.nio.fs.UnixCopyFile.transfer(Native Method)
at java.nio.file.Files.copy(Files.java:1274)
at alluxio.ReadMain.prepareDataset(ReadMain.java:289)
From the Fuse jstack, all threads are blocked waiting to acquireBlockWorkerClientInternal
. Note that the max value of block worker client is 1024 which should be sufficient.
- parking to wait for <0x00007f198e4fb0b8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at alluxio.client.file.FileSystemContext.acquireBlockWorkerClientInternal(FileSystemContext.java:549)
at alluxio.client.file.AlluxioFileInStream.read(AlluxioFileInStream.java:183)
- locked <0x00007f1f8542b4b0> (a alluxio.client.file.AlluxioFileInStream)
at alluxio.client.file.FileSystemContext.acquireBlockWorkerClientInternal(FileSystemContext.java:549)
at alluxio.client.block.stream.LocalFileDataWriter.create(LocalFileDataWriter.java:78)
at alluxio.fuse.AlluxioJniFuseFileSystem.writeInternal(AlluxioJniFuseFileSystem.java:576)
To Reproduce Code: https://github.com/LuQQiu/LuTemp/tree/master/src/main/java/alluxio Launch the Alluxio cluster and Fuse
Run the test program with
java -Xmx16G -Xms16G -XX:MaxDirectMemorySize=16g -cp target/LuTemp-1.0-SNAPSHOT-jar-with-dependencies.jar alluxio.ReadMain -l /local2 -f /mnt/alluxio-fuse -t 64 -i 1 -d 60 -s > testDiff.log
The application will stuck after 20min Expected behavior The application should not stuck
Urgency Describe the impact and urgency of the bug.
Are you planning to fix it Please indicate if you are already working on a PR.
Additional context there are 127 RPC threads and 128 Streaming threads in Fuse process.
Issue Analytics
- State:
- Created a year ago
- Comments:8 (8 by maintainers)
Top GitHub Comments
that’s possible but not able to verify now
Cannot reproduce… consider closing the issue