question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

glob is inefficient as it's iterating a dir that already scanned

See original GitHub issue

I tried glob method and found it is too slow when there’re millions of files in the directory.

turns out that the glob method will first call list_objects_v2 api first, get all files (every single file including folders and files), identify all files to see if they are folders. and then scan the folders.

The algorighm is corret in traditional fs, while inefficient in s3, s3 will return every object when requesting list_objects_v2 api, iterating subfolders are unneccessary.

Is that possible to fix it in s3path or it can only be fixed in pathlib ?

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:11 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
four43commented, Jul 15, 2021

Oh man, python why!? That’s a mega bummer!

1reaction
liormizrcommented, Jul 15, 2021

@four43 yes, you are right One of the optimizations that I want to do is remove this list creation in the s3 implementation

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to use glob() to find files recursively? - Stack Overflow
If recursive is True (default is False ), the pattern ** will match any files and zero or more directories and subdirectories ....
Read more >
Calling a Function indexed from within a For loop - Raspberry Pi ...
The business part of my code now looks like this. Code: Select all DefCall = [Dir, Track, Slow, Med, Fast] def scan(): global...
Read more >
PEP 471 – os.scandir() function – a better and faster directory ...
It returns a generator instead of a list, so that scandir acts as a true iterator instead of returning the full list immediately....
Read more >
Walking with filesystems: Go's new fs.FS interface
The new io/fs package introduced in Go 1.16 gives us a powerful new way of working with filesystems: that is, trees of files....
Read more >
loop through all files in a directory python - You.com | The search ...
os.listdir (), os.scandir (), pathlib module, os.walk (), and glob module are the methods available to iterate over files. A directory is also...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found