Optimize the du -s command
See original GitHub issueIs your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I’m always frustrated when […]
It’s hard for me to do alluxio fs du -sh /
if there are large amounts of files under /
.
For instance, I’ve got about 3.8 million files in my Aliyun OSS which I’ve already mounted on Alluxio. Now, if I try to run alluxio fs du -sh /
, I would get an OOM error.
I’ve tried to set a larger JVM heap size by setting env variable ALLUXIO_USER_JAVA_OPTS
to -Xmx8G
, but I’ve got the same OOM error.
bash-4.4# alluxio fs du -sh /
File Size In Alluxio Path
SLF4J: Failed toString() invocation on an object of type [java.util.ArrayList]
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:3332)
at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124)
at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:448)
at java.lang.StringBuilder.append(StringBuilder.java:136)
at java.lang.StringBuilder.append(StringBuilder.java:131)
at java.util.AbstractCollection.toString(AbstractCollection.java:462)
at org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:304)
at org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:276)
at org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:230)
at org.slf4j.impl.Log4jLoggerAdapter.warn(Log4jLoggerAdapter.java:463)
at alluxio.AbstractClient.retryRPC(AbstractClient.java:372)
at alluxio.client.file.RetryHandlingFileSystemMasterClient.listStatus(RetryHandlingFileSystemMasterClient.java:228)
at alluxio.client.file.BaseFileSystem.lambda$listStatus$9(BaseFileSystem.java:274)
at alluxio.client.file.BaseFileSystem$$Lambda$71/825249556.call(Unknown Source)
at alluxio.client.file.BaseFileSystem.rpc(BaseFileSystem.java:531)
at alluxio.client.file.BaseFileSystem.listStatus(BaseFileSystem.java:270)
at alluxio.cli.fs.command.DuCommand.runPlainPath(DuCommand.java:94)
at alluxio.cli.fs.command.AbstractFileSystemCommand.runWildCardCmd(AbstractFileSystemCommand.java:92)
at alluxio.cli.fs.command.DuCommand.run(DuCommand.java:207)
at alluxio.cli.AbstractShell.run(AbstractShell.java:137)
at alluxio.cli.fs.FileSystemShell.main(FileSystemShell.java:66)
84.29GB 0B (0%) /
Here is my jmap -histo
result:
bash-4.4# jps
2417 FileSystemShell
257 AlluxioJobMaster
258 AlluxioMaster
2504 Jps
bash-4.4# jmap -histo 2417 | head -20
num #instances #bytes class name
----------------------------------------------
1: 34261544 3187371480 [C
2: 34261508 822276192 java.lang.String
3: 3804849 639214632 alluxio.wire.FileInfo
4: 11414717 547906416 java.util.HashMap
5: 15219885 365277240 java.util.ArrayList
6: 3805045 304420728 [Ljava.util.HashMap$Node;
7: 7612891 198845528 [Ljava.lang.Object;
8: 7610004 182640096 java.lang.Long
9: 3807207 121830624 java.util.HashMap$Node
10: 3804850 121755200 alluxio.security.authorization.AccessControlList
11: 3804847 121755104 alluxio.wire.BlockInfo
12: 3804847 121755104 alluxio.wire.FileBlockInfo
13: 3804874 60877984 java.util.HashSet
14: 3804849 60877584 alluxio.client.file.URIStatus
15: 2149 34411984 [I
16: 537922 8606752 java.util.HashMap$KeySet
17: 4144 465776 java.lang.Class
It looks like all the FileInfo
instances stored in JVM heap, and it won’t be recycled for future use.
Describe the solution you’d like
A clear and concise description of what you want to happen.
Can we change the behavior for the command alluxio fs du -s <path>
, and do all the sum work on Alluxio master side instead of client side, thus no need for client side to get all the FileInfo
instances.
Describe alternatives you’ve considered
A clear and concise description of any alternative solutions or features you’ve considered.
Or maybe the client side don’t have to sum file size after all the FileInfo
instances have been instantiated, summed FileInfos can be recycled during next GC
Urgency Explain why the feature is important Urgent, an UFS with large amount of small files may be common in our scenario.
Additional context Add any other context or screenshots about the feature request here. I’ve found some related issue here: #12088
Issue Analytics
- State:
- Created 3 years ago
- Comments:5 (4 by maintainers)
Top GitHub Comments
@TrafalgarZZZ I will take a look! thanks for the report
Resolved by #12423