question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

DirectoryStore `.keys()` optimisation.

See original GitHub issue

It looks like the keys() method of directory store reimplement a lot of the logic of os.walk() that could be used to increase efficiency.

in particular using os.walk() will directly return directory and files separately and will avoid the two extra expensive call to os.path.isfile(path), and os.path.isdir(path).

forming the keys will be a tiny bit less straightforward than current method as os.walk() returns the full relative path you gave it whether or not you terminate it by a slash, so we might have to be careful with this.

(I also note that this methods does list os specific files, like .DStore.. not sure this is on purpose.

Issue Analytics

  • State:open
  • Created 3 years ago
  • Comments:6 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
Carreaucommented, May 9, 2020

I do have a prototype which appear to be a bit faster on small zarr on my laptop.

In [31]: %timeit list(sorted(z.store.keys_2()))
621 µs ± 8.48 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [32]: %timeit list(sorted(z.store.keys()))
2.05 ms ± 47.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Some tests do seem to rely on keys ordering though…

1reaction
Carreaucommented, May 9, 2020

Notes, os.walk() may also return path with \ instead of /,

And yes os.scandir is used inside os.walk()

Read more comments on GitHub >

github_iconTop Results From Across the Web

Tutorial — zarr 2.13.3 documentation - Read the Docs
The zarr.convenience.open() function provides a convenient way to create a new persistent array or continue working with an existing array.
Read more >
memmap reads from directory store · Issue #265 - GitHub
Anyway i feel im prematurely optimising as a distraction. ... def __getitem__(self, key): #this is not the right place for this but for ......
Read more >
Optimising Python dictionary access code - Stack Overflow
On every iteration of my algorithm, a random subset of these nodes are chosen and propagate_distances_node() is called on them. This means the ......
Read more >
Hibernate Search Reference Guide JBoss Enterprise Application ...
Most, if not all of the time, the property is the database primary key. ... environment or in clustered environments where the directory...
Read more >
scipy.optimize.OptimizeResult.keys — SciPy v1.9.3 Manual
optimize.OptimizeResult.keys#. OptimizeResult.keys() → ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found