question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

zarr.open_array does not properly recognize NestedDirectoryStore

See original GitHub issue
import numpy
import zarr

a = zarr.create((10, 100, 100), chunks = (1, 100, 100), dtype = 'f4',
                store = zarr.NestedDirectoryStore('foo.zarr'), overwrite = True)

a[...] = 1.0

print('shape =', a.shape)
print('a has non-zero values =', numpy.any(a))

del a

a = zarr.open_array('foo.zarr', mode = 'r')

print('shape =', a.shape)
print('a has non-zero values =', numpy.any(a))

Problem description

A zarr ‘file’ created using NestedDirectoryStore will not load correctly when later opened using the path alone. The shape (and other metadata) is correct, but the values are not loaded. In the code sample above, the shape of the original and reloaded array is the same, but the contents don’t match. The second array contains only zeroes.

Version and installation information

  • Value of zarr.__version__ - 2.3.2
  • Value of numcodecs.__version__ - 0.6.4
  • Version of Python interpreter - 3.6.7 (anaconda)
  • Operating system (Linux/Windows/Mac) - Mac OS Mojave (10.14.6)
  • How Zarr was installed - using conda

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Reactions:3
  • Comments:11 (7 by maintainers)

github_iconTop GitHub Comments

2reactions
stuartebergcommented, Feb 28, 2020

+1 for this issue!

Instead of guessing the store from the file extension, can the store class name simply be added to the metadata in .zarray?

1reaction
alimanfoocommented, Feb 28, 2020

FWIW I agree that ideally the application should not need to know ahead of time which store path separator has been used for chunks, but should be able to discover that from the array metadata. This is something I have proposed to introduce in the v3 spec, although that is still just a draft.

For the current implementation based on the v2 spec we could either leave it as is, and try to be clear in documentation that if you use nested store you need to communicate this somehow to users of the data so they know how to open correctly. Alternatively we could add some metadata field in the .zarray to communicate which chunk path separator is used.

Given this will be fixed in the v3 spec, I’d be inclined to wait until then. Happy to discuss though.

On Fri, 28 Feb 2020, 18:31 Stuart Berg, notifications@github.com wrote:

+1 for this issue!

Instead of guessing the store from the file extension, can the store class name simply be added to the metadata in .zarray?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/zarr-developers/zarr-python/issues/530?email_source=notifications&email_token=AAFLYQSAXJBOBXBYOIA6LZDRFFJ6TA5CNFSM4J4M52QKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOENJVNVQ#issuecomment-592664278, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFLYQR7UJYHOXA4MPRODZDRFFJ6TANCNFSM4J4M52QA .

Read more comments on GitHub >

github_iconTop Results From Across the Web

Storage (zarr.storage) — zarr 2.13.3 documentation
The NestedDirectoryStore class provides an alternative where chunk files for multidimensional arrays will be organised into a directory hierarchy, thus reducing ...
Read more >
Tutorial — zarr 2.13.3 documentation - Read the Docs
Zarr provides classes and functions for working with N-dimensional arrays that behave like NumPy arrays but whose data is divided into chunks and...
Read more >
Source code for zarr.storage
Source code for zarr.storage. """This module contains storage classes for use with Zarr arrays and groups. Note that any object implementing the :class:` ......
Read more >
Storage (zarr.storage) — zarr 2.7.0 documentation
The NestedDirectoryStore class provides an alternative where chunk files for multidimensional arrays will be organised into a directory hierarchy, thus reducing ...
Read more >
Convenience functions (zarr.convenience) — zarr 2.13.3 ...
If loading data from a group of arrays, data will not be immediately loaded into memory. Rather, arrays will be loaded into memory...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found