question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

High concurrency Fuse read stuck

See original GitHub issue

Alluxio Version: 2.9.0-SNAPSHOT

Describe the bug The read applications generate 64 threads, each thread creates a file in Alluxio and keeps doing sequential read operations with different buffer sizes against the file through Alluxio POSIX API.

After the test run for more than 20 minutes, the read application stuck. 55 threads out of 64 are stuck when reading from AlluxioFuse

at java.io.FileInputStream.readBytes(Native Method)
at java.io.FileInputStream.read(FileInputStream.java:233)
at alluxio.SequentialReadTest.sequentialReadSingleFile(SequentialReadTest.java:49)

9 remaining threads are stuck when writing to AlluxioFuse

at sun.nio.fs.UnixCopyFile.transfer(Native Method)
at java.nio.file.Files.copy(Files.java:1274)
at alluxio.ReadMain.prepareDataset(ReadMain.java:289)

From the Fuse jstack, all threads are blocked waiting to acquireBlockWorkerClientInternal. Note that the max value of block worker client is 1024 which should be sufficient.

- parking to wait for <0x00007f198e4fb0b8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at alluxio.client.file.FileSystemContext.acquireBlockWorkerClientInternal(FileSystemContext.java:549)
at alluxio.client.file.AlluxioFileInStream.read(AlluxioFileInStream.java:183)
- locked <0x00007f1f8542b4b0> (a alluxio.client.file.AlluxioFileInStream)
at alluxio.client.file.FileSystemContext.acquireBlockWorkerClientInternal(FileSystemContext.java:549)
at alluxio.client.block.stream.LocalFileDataWriter.create(LocalFileDataWriter.java:78)
at alluxio.fuse.AlluxioJniFuseFileSystem.writeInternal(AlluxioJniFuseFileSystem.java:576)

To Reproduce Code: https://github.com/LuQQiu/LuTemp/tree/master/src/main/java/alluxio Launch the Alluxio cluster and Fuse Screen Shot 2022-05-27 at 10 57 56 AM

Run the test program with

java -Xmx16G -Xms16G -XX:MaxDirectMemorySize=16g  -cp target/LuTemp-1.0-SNAPSHOT-jar-with-dependencies.jar alluxio.ReadMain -l /local2 -f /mnt/alluxio-fuse -t 64 -i 1 -d 60 -s > testDiff.log

The application will stuck after 20min Expected behavior The application should not stuck

Urgency Describe the impact and urgency of the bug.

Are you planning to fix it Please indicate if you are already working on a PR.

Additional context there are 127 RPC threads and 128 Streaming threads in Fuse process.

Issue Analytics

  • State:open
  • Created a year ago
  • Comments:8 (8 by maintainers)

github_iconTop GitHub Comments

1reaction
LuQQiucommented, Jun 8, 2022

I saw 42 gc thread in readMain.csv, is the system overloaded? That is possible to hang the system.

that’s possible but not able to verify now

0reactions
LuQQiucommented, Jun 8, 2022

Cannot reproduce… consider closing the issue

Read more comments on GitHub >

github_iconTop Results From Across the Web

FIO high concurrency read on Fuse stuck · Issue #15867 · Alluxio ...
Alluxio Version: master. Describe the bug. Running FIO read against AlluxioFuse stuck under high concurrency. 64 processes on 64 1G-files. To Reproduce
Read more >
764743 – (GLUSTER-3011) Uninterruptible processes writing ...
My assumption: The huge number of connections caused problems to GlusterFS/Fuse while syncing with the second server and ensuring a consistent file volume....
Read more >
Blobfuse Troubleshooting - Microsoft Community Hub
Scenario 1: High CPU Utilization for the Blob fuse process. This is among the most common issues being faced while working with Storage ......
Read more >
Some Jenkins jobs tend to be stuck and never times out
Description ; Computer.threadPoolForRemoting [#4324] · "Computer.threadPoolForRemoting [#4324]" Id=136471 Group=main TIMED_WAITING on java.util.concurrent.
Read more >
10 things you should know about using AWS S3 - Sumo Logic
Use concurrency to improve AWS S3 latency and performance ... Are there people who should not be able to read this data?
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found