Store global attributes as HDF5 Datasets
See original GitHub issueGlobal (file-level) attributes are currently stored as HDF5 Attributes. However, such attributes are limited to be small (no hard limit but the spec says 16 kB) and cannot be sliced.
However, it would be useful to be able to store arbitrarily large amounts of data on the global level, such as pickled objects, images, or other supporting data.
Two options
-
Add a new API for large global objects (say,
LoomConnection.blobs
) and store them as Datasets (e.g. under/global
) in the file. This would retain backwards compatibility but would require maintaining two different APIs that do almost the same thing. New files will use a mixture of old-style and new-style attributes indefinitely. Only new-style global attributes in new files would be invisible when opened using an older library implementation. -
Keep the current API but change the Loom file format spec to store global attributes as Datasets (e.g. under
/global
). Implementors would still need to look for attributes both as HDF5 attributes and as Datasets, to ensure old files would still be readable. New files will use a consistent API and consistent file format. For backwards-compatibility, implementors should write global attributes as HDF5 Attributes (in addition to writing them as Datasets) if they are smaller than 16 kB. Larger global attributes in new files would be invisible when opened using an older library implementation.
I think option 2 is nicer and should be compatible enough.
Issue Analytics
- State:
- Created 5 years ago
- Comments:5 (3 by maintainers)
Top GitHub Comments
Fixed in loompy3.0 branch
Not yet, but I’ll work on it. I think it’s soon time for a loompy 3 release, which will make it possible to make changes to the file spec.