question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

listStatus should update files' lastSyncTime under the directory for HDFS

See original GitHub issue

Is your feature request related to a problem? Please describe.

When ufs is hdfs, listStatus a directory

  1. syncs and updates meta data of files under it
  2. does not updates lastSyncTime of files under it

These causes accessing a file under the directory just after listStatus needs to sync meta once more.

Below is an example with alluxio.user.file.metadata.sync.interval=5m

// 1. the meta data of the file is outdated
$ bin/alluxio fs ls -Dalluxio.user.file.metadata.sync.interval=-1 /nginx/master.log
-rw-rw-r--  alluxio   alluxio   2036694       PERSISTED 04-29-2022 12:18:00:414 100% /nginx/master.log

// 2. listStatus on the directory
$ bin/alluxio fs ls /nginx
-rw-rw-r--  alluxio   alluxio   2037635       PERSISTED 04-29-2022 12:22:22:016   0% /nginx/master.log

// 3. check audit log at hdfs side
$ tail hdfs-audit.log|grep nginx
2022-04-29 12:26:50,818 INFO FSNamesystem.audit: allowed=true   ugi=... ip=... cmd=getfileinfo src=.../nginx
2022-04-29 12:26:50,821 INFO FSNamesystem.audit: allowed=true   ugi=... ip=... cmd=listStatus  src=.../nginx

// 4. check meta data of the file at alluxio side, it is updated
$ bin/alluxio fs ls -Dalluxio.user.file.metadata.sync.interval=-1 /nginx/master.log
-rw-rw-r--  alluxio   alluxio   2037635       PERSISTED 04-29-2022 12:22:22:016   0% /nginx/master.log

// 5. access the file with normal style
$ bin/alluxio fs ls /nginx/master.log
-rw-rw-r--  alluxio  alluxio    2037635       PERSISTED 04-29-2022 12:22:22:016   0% /nginx/master.log

// 6. check audit log at hdfs side, find the meta data of the file synced once more
$ tail hdfs-audit.log|grep nginx
2022-04-29 12:27:06,947 INFO FSNamesystem.audit: allowed=true   ugi=... ip=..  cmd=getfileinfo src=.../nginx/master.log

Describe the solution you’d like listStatus should also update lastSyncTime of files under the directory.

Describe alternatives you’ve considered No

Urgency This can improve the performance of AlluxioMaster.

Additional context Testing based on alluxio-2.7.1

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:9 (9 by maintainers)

github_iconTop GitHub Comments

1reaction
secfreecommented, May 14, 2022

@jiacheliu3 May I know

  1. Why is SyncTime.mLastRecursiveSyncMs needed besides SyncTime.mLastSyncMs?
  2. Is it ok to update mLastRecursiveSyncMs of the first level children after listStatus the directory without “recursive” option?

Checked the code and learned that I had misunderstanding about updating mLastRecursiveSyncMs. Setting mLastRecursiveSyncMs of the directory is faster than updating all its offsprings, so it is needed.

Please ignore the above two questions here. Sorry for the inconvenience.

0reactions
secfreecommented, Oct 7, 2022

#16081 updates the metadata of children if listStatus the directory prefetched the metadata, so I will close this issue.

Read more comments on GitHub >

github_iconTop Results From Across the Web

FileSystem (Apache Hadoop Main 3.3.4 API)
Get a FileSystem instance based on the uri, the passed in configuration and the user. AclStatus · getAclStatus(Path path). Gets the ACL of...
Read more >
java - How to list all files in a directory and its subdirectories in ...
I found that FileSystem fs = path.getFileSystem(conf); instead of FileSystem fs = FileSystem.get(conf) correctly loads the fs type (hdfs://, gs ...
Read more >
What is the HDFS command to list all the files in ... - Edureka
I tried hdfs dfs ls -l which provides the list of directories with their respective permissions. I tried a workaround with hdfs -dfs...
Read more >
List all files in hdfs directory - hadoop - Server Fault
I want to list all files and want to copy their name in one file but when I run the following command, it...
Read more >
Hadoop Distributed File System (HDFS) - Databricks
Follow the steps below which will guide you on how to create the system, edit it, and remove it if needed. Listing your...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found