question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Requirements of store data

See original GitHub issue

Raising this issue to get an idea of what our requirements are of stores and what can be placed in them.

For instance in many cases we require Arrays to have an object_codec to allow storing object types and many stores would have difficulty with this data without explicit conversion to some sort of bytes-like object; however, we appear to be placing objects in a store as a test. Also we seem to expect stores to be easily comparable; however, this doesn’t work if the store has NumPy ndarrays in it. ( https://github.com/zarr-developers/zarr/issues/348 )

Should we set some explicit requirements about what stores require? If so, what would those requirements be? Also how would we enforce them?

Issue Analytics

  • State:open
  • Created 5 years ago
  • Comments:10 (10 by maintainers)

github_iconTop GitHub Comments

1reaction
jakirkhamcommented, Nov 29, 2021

In PR ( https://github.com/zarr-developers/zarr-python/pull/789 ) we added a BaseStore class, which addresses some of these basic needs of Stores

Subsequent discussion around the v3 spec and storing standardized data from libraries handles other concerns raised here

Were there any other things still needing to be addressed here?

cc @joshmoore @grlee77

0reactions
alimanfoocommented, Jan 10, 2019

Thanks Ryan, good points. We certainly could be more explicit about the set of operations that a storage system must support, and make sure we include everything (e.g., listing all keys). We could also state the optional operations, which are not strictly necessary but allow for some optimisations or additional features, like being able to list all the keys that are children of some hierarchy path (the listdir() method in Python implementations).

We could do this in a language-independent way but still make it clear and concrete how this corresponds to specific operations supported by a file system or a cloud object service or whatever.

I think we could also do this as an update to the format spec, without requiring a new spec version, as these would be clarifications of the existing spec.

On Thu, 10 Jan 2019, 10:55 Ryan Abernathey <notifications@github.com wrote:

Thanks for the clarification. I see how this thread is specific to python implementations.

I guess I worry that the spec is too vague with regards to the implementation of the key value store, and the methods that can be used to query it:

A Zarr array can be stored in any storage system that provides a key/value interface, where a key is an ASCII string and a value is an arbitrary sequence of bytes, and the supported operations are read (get the sequence of bytes associated with a given key), write (set the sequence of bytes associated with a given key) and delete (remove a key/value pair).

In terms of operations, “Read”, “write”, and “delete” doesn’t seem like enumeration of operations a store must support. When implementing a store, you also need at least some form of “list” operation; otherwise zarr can’t discover what is in the store. (The exception is consolidated metadata stores.) In fact, you have to implement a MutableMapping https://docs.python.org/3/library/collections.abc.html, which has five methods: getitem, setitem, delitem, iter, and len.)

More generally, how do we ensure that DirectoryStore, ZipStore, or any of the myriad cloud stores that have been developed can truly be read from different implementations of zarr? I wonder if it would be worth explicitly defining a spec for certain commonly used stores that gives more detail about the implementation choices that have already been made in the zarr python code.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/zarr-developers/zarr/issues/349#issuecomment-453054878, or mute the thread https://github.com/notifications/unsubscribe-auth/AAq8QldD8wCmVJiYP8yEB07T3b7V7C8Zks5vBxw5gaJpZM4Y9kXZ .

Read more comments on GitHub >

github_iconTop Results From Across the Web

Storage requirements overview - IBM
To estimate storage requirements for a Content Manager OnDemand system, you must understand and document user requirements for storing and accessing data.
Read more >
Five Basic Rules of Data Storage - AllBusiness.com
Five Basic Rules of Data Storage · 1. Come Up With a Storage Plan · 2. Choose a Backup and Storage Method ·...
Read more >
Legal Requirements for Storing Business Documents - IntelliSoft
For example, employee and company data can be stored for three to ten years. Thus, data on industrial accidents are stored for ten...
Read more >
Criteria for choosing a data store - Azure - Microsoft Learn
Functional requirements · Data format. What type of data are you intending to store? · Data size. How large are the entities you...
Read more >
7 Essential Compliance Regulations for Data Storage Systems
Disclosures · Privacy Policies · Encryption and Anonymizing · Firewalls and Access Control · Audit Logs · Retention Schedules · Breach Notifications ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found