Dev Observability
Product
Pricing
Docs
Resources
Blog
Company
Debug Wordle

question-mark

Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

glob is inefficient as it's iterating a dir that already scanned

See original GitHub issue

I tried glob method and found it is too slow when there’re millions of files in the directory.

turns out that the glob method will first call list_objects_v2 api first, get all files (every single file including folders and files), identify all files to see if they are folders. and then scan the folders.

The algorighm is corret in traditional fs, while inefficient in s3, s3 will return every object when requesting list_objects_v2 api, iterating subfolders are unneccessary.

Is that possible to fix it in s3path or it can only be fixed in pathlib ?

Issue Analytics

State:
Created 2 years ago
Comments:11 (4 by maintainers)

Top GitHub Comments

1reaction

four43commented, Jul 15, 2021

Oh man, python why!? That’s a mega bummer!

1reaction

liormizrcommented, Jul 15, 2021

@four43 yes, you are right One of the optimizations that I want to do is remove this list creation in the s3 implementation

Read more comments on GitHub >

Top Results From Across the Web

How to use glob() to find files recursively? - Stack Overflow

If recursive is True (default is False ), the pattern ** will match any files and zero or more directories and subdirectories ....

Calling a Function indexed from within a For loop - Raspberry Pi ...

The business part of my code now looks like this. Code: Select all DefCall = [Dir, Track, Slow, Med, Fast] def scan(): global...

PEP 471 – os.scandir() function – a better and faster directory ...

It returns a generator instead of a list, so that scandir acts as a true iterator instead of returning the full list immediately....

Walking with filesystems: Go's new fs.FS interface

The new io/fs package introduced in Go 1.16 gives us a powerful new way of working with filesystems: that is, trees of files....

loop through all files in a directory python - You.com | The search ...

os.listdir (), os.scandir (), pathlib module, os.walk (), and glob module are the methods available to iterate over files. A directory is also...

Top Related Medium Post

No results found

Top Related StackOverflow Question

No results found

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Top Related Reddit Thread

No results found

Top Related Hackernoon Post

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

Top Related Hashnode Post

No results found

Cannot configure default session

URLencoding breakes S3 key