Zarr DirectoryStore: getsize is wrong
See original GitHub issueThis code does not account for subfolders created by this example:
store = zarr.TempStore(dir='/tmpfs')
for name, values in data.items():
zarr.array(values, store=store, path=name, compressor=compressor, filters=filter)
total_size = store.getsize()
total_size
is 26
, (24+2 for .zattrs
and .zgroup
), which is wrong. Subfolders are not accounted for.
My TempStore has the following structure:
.zattrs
.group
name1/
1
2
3
name2/
1
2
3
The file sizes of name1 and name2 are unaccounted for.
Tested with Zarr 2.2.0.
Issue Analytics
- State:
- Created 5 years ago
- Comments:11 (10 by maintainers)
Top Results From Across the Web
Storage (zarr.storage) — zarr 2.13.3 documentation
This module contains storage classes for use with Zarr arrays and groups. ... DirectoryStore(path, normalize_keys=False, dimension_separator=None)[source]¶.
Read more >Storage (zarr.storage) — zarr 2.7.0 documentation
Default value is False. Notes. The DirectoryStore class stores all chunk files for an array together in a single directory. On some file...
Read more >Source code for zarr.storage
If `store` provides a `getsize` method, this will be called, ... [docs]class DirectoryStore(Store): """Storage class using directories and files on a ...
Read more >zarr.storage — zarr 2.7.1 documentation - Read the Docs
Default value is False. Examples -------- Store a single array:: >>> import zarr >>> store = zarr.DirectoryStore('data/array.zarr') >>> z = zarr.zeros((10, ...
Read more >Release notes — zarr 2.13.3 documentation - Read the Docs
Fix bug where the checksum of zipfiles is wrong By Oren Watson #930. ... Use scandir in DirectoryStore 's getsize method. By John...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Sorry for slow follow up. FWIW I think it would be reasonable to change the behaviour and make it recursive. Should not affect results for reporting stored size of an array in a DirectoryStore, as there shouldn’t be any sub-directories there anyway. (E.g., I often call
.info
on an array to see how big it is on disk.) I’m happy for the tests to be modified.Is the only way to get the current estimated size by parsing the returned value of
info_items
in our own dictionary then indexingNo. bytes stored
?