question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Zarr DirectoryStore: getsize is wrong

See original GitHub issue

https://github.com/zarr-developers/zarr/blob/3c8e9291e96e090e20caf6950da69a3cc180652f/zarr/storage.py#L864-L871

This code does not account for subfolders created by this example:

store = zarr.TempStore(dir='/tmpfs')
for name, values in data.items():
    zarr.array(values, store=store, path=name, compressor=compressor, filters=filter)
total_size = store.getsize()

total_size is 26, (24+2 for .zattrs and .zgroup), which is wrong. Subfolders are not accounted for.

My TempStore has the following structure:

.zattrs
.group
name1/
  1
  2
  3
name2/
  1
  2
  3

The file sizes of name1 and name2 are unaccounted for.

Tested with Zarr 2.2.0.

Issue Analytics

  • State:open
  • Created 5 years ago
  • Comments:11 (10 by maintainers)

github_iconTop GitHub Comments

1reaction
alimanfoocommented, Feb 19, 2019

Sorry for slow follow up. FWIW I think it would be reasonable to change the behaviour and make it recursive. Should not affect results for reporting stored size of an array in a DirectoryStore, as there shouldn’t be any sub-directories there anyway. (E.g., I often call .info on an array to see how big it is on disk.) I’m happy for the tests to be modified.

0reactions
hmaarrfkcommented, Jul 14, 2019

Is the only way to get the current estimated size by parsing the returned value of info_items in our own dictionary then indexing No. bytes stored?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Storage (zarr.storage) — zarr 2.13.3 documentation
This module contains storage classes for use with Zarr arrays and groups. ... DirectoryStore(path, normalize_keys=False, dimension_separator=None)[source]¶.
Read more >
Storage (zarr.storage) — zarr 2.7.0 documentation
Default value is False. Notes. The DirectoryStore class stores all chunk files for an array together in a single directory. On some file...
Read more >
Source code for zarr.storage
If `store` provides a `getsize` method, this will be called, ... [docs]class DirectoryStore(Store): """Storage class using directories and files on a ...
Read more >
zarr.storage — zarr 2.7.1 documentation - Read the Docs
Default value is False. Examples -------- Store a single array:: >>> import zarr >>> store = zarr.DirectoryStore('data/array.zarr') >>> z = zarr.zeros((10, ...
Read more >
Release notes — zarr 2.13.3 documentation - Read the Docs
Fix bug where the checksum of zipfiles is wrong By Oren Watson #930. ... Use scandir in DirectoryStore 's getsize method. By John...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found