listStatus should update files' lastSyncTime under the directory for HDFS
See original GitHub issueIs your feature request related to a problem? Please describe.
When ufs is hdfs, listStatus a directory
- syncs and updates meta data of files under it
- does not updates lastSyncTime of files under it
These causes accessing a file under the directory just after listStatus needs to sync meta once more.
Below is an example with alluxio.user.file.metadata.sync.interval=5m
// 1. the meta data of the file is outdated
$ bin/alluxio fs ls -Dalluxio.user.file.metadata.sync.interval=-1 /nginx/master.log
-rw-rw-r-- alluxio alluxio 2036694 PERSISTED 04-29-2022 12:18:00:414 100% /nginx/master.log
// 2. listStatus on the directory
$ bin/alluxio fs ls /nginx
-rw-rw-r-- alluxio alluxio 2037635 PERSISTED 04-29-2022 12:22:22:016 0% /nginx/master.log
// 3. check audit log at hdfs side
$ tail hdfs-audit.log|grep nginx
2022-04-29 12:26:50,818 INFO FSNamesystem.audit: allowed=true ugi=... ip=... cmd=getfileinfo src=.../nginx
2022-04-29 12:26:50,821 INFO FSNamesystem.audit: allowed=true ugi=... ip=... cmd=listStatus src=.../nginx
// 4. check meta data of the file at alluxio side, it is updated
$ bin/alluxio fs ls -Dalluxio.user.file.metadata.sync.interval=-1 /nginx/master.log
-rw-rw-r-- alluxio alluxio 2037635 PERSISTED 04-29-2022 12:22:22:016 0% /nginx/master.log
// 5. access the file with normal style
$ bin/alluxio fs ls /nginx/master.log
-rw-rw-r-- alluxio alluxio 2037635 PERSISTED 04-29-2022 12:22:22:016 0% /nginx/master.log
// 6. check audit log at hdfs side, find the meta data of the file synced once more
$ tail hdfs-audit.log|grep nginx
2022-04-29 12:27:06,947 INFO FSNamesystem.audit: allowed=true ugi=... ip=.. cmd=getfileinfo src=.../nginx/master.log
Describe the solution you’d like listStatus should also update lastSyncTime of files under the directory.
Describe alternatives you’ve considered No
Urgency This can improve the performance of AlluxioMaster.
Additional context Testing based on alluxio-2.7.1
Issue Analytics
- State:
- Created a year ago
- Comments:9 (9 by maintainers)
Top Results From Across the Web
FileSystem (Apache Hadoop Main 3.3.4 API)
Get a FileSystem instance based on the uri, the passed in configuration and the user. AclStatus · getAclStatus(Path path). Gets the ACL of...
Read more >java - How to list all files in a directory and its subdirectories in ...
I found that FileSystem fs = path.getFileSystem(conf); instead of FileSystem fs = FileSystem.get(conf) correctly loads the fs type (hdfs://, gs ...
Read more >What is the HDFS command to list all the files in ... - Edureka
I tried hdfs dfs ls -l which provides the list of directories with their respective permissions. I tried a workaround with hdfs -dfs...
Read more >List all files in hdfs directory - hadoop - Server Fault
I want to list all files and want to copy their name in one file but when I run the following command, it...
Read more >Hadoop Distributed File System (HDFS) - Databricks
Follow the steps below which will guide you on how to create the system, edit it, and remove it if needed. Listing your...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Checked the code and learned that I had misunderstanding about updating
mLastRecursiveSyncMs
. SettingmLastRecursiveSyncMs
of the directory is faster than updating all its offsprings, so it is needed.Please ignore the above two questions here. Sorry for the inconvenience.
#16081 updates the metadata of children if listStatus the directory prefetched the metadata, so I will close this issue.