Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Tree option to omit array metadata (shape, dtype)

See original GitHub issue

When using the tree() function/method, currently arrays are printed with shape and dtype. This is useful diagnostic information but requires that the .zarray resource is retrieved and read for every array in the tree. This is not an issue for data stored locally, but can be an issue for remote storage as retrieving each .zarray resource will require a network round-trip.

Proposed to add an option meta=True to the tree() function/method, which if set to meta=False will omit the array metadata in the output, and thus building the tree representation will require only retrieving the list of keys from the store.

Issue Analytics

State:
Created 6 years ago
Comments:11 (10 by maintainers)

Top GitHub Comments

1reaction

jakirkhamcommented, Apr 12, 2021

This may be a different issue. Would suggest looking into consolidated metadata

0reactions

PaulJWrightcommented, Apr 12, 2021

Has there been any progress on this? I am noticing very large wall times (currently at ~6 min) with data stored on GCP. I am new to zarr in general, so any advice to reduce this would be great too!

gcs = gcsfs.GCSFileSystem(access='read_only')
store = gcsfs,GCSMap('file.zarr', gcs=gcs, check=False)
root = zarr.group(store)

%time print(root.tree())
/
 ├── 2010
 │   ├── 131A (47116, 512, 512) float32
 │   ├── 1600A (47972, 512, 512) float32
 │   ├── 1700A (46858, 512, 512) float32
 │   ├── 171A (47186, 512, 512) float32
 │   ├── 193A (47134, 512, 512) float32
 │   ├── 211A (47186, 512, 512) float32
 │   ├── 304A (47131, 512, 512) float32
 │   ├── 335A (47187, 512, 512) float32
 │   └── 94A (46930, 512, 512) float32
 ├── 2011
 │   ├── 131A (75200, 512, 512) float32
 │   ├── 1600A (75814, 512, 512) float32
 │   ├── 1700A (74839, 512, 512) float32
 │   ├── 171A (75660, 512, 512) float32
 │   ├── 193A (75664, 512, 512) float32
 │   ├── 211A (75678, 512, 512) float32
 │   ├── 304A (74199, 512, 512) float32
 │   ├── 335A (75624, 512, 512) float32
 │   └── 94A (75138, 512, 512) float32
 ├── 2012
 │   ├── 131A (76849, 512, 512) float32
 │   ├── 1600A (76630, 512, 512) float32
 │   ├── 1700A (69091, 512, 512) float32
 │   ├── 171A (76750, 512, 512) float32
 │   ├── 193A (76852, 512, 512) float32
 │   ├── 211A (76870, 512, 512) float32
 │   ├── 304A (76851, 512, 512) float32
 │   ├── 335A (76855, 512, 512) float32
 │   └── 94A (76878, 512, 512) float32
 ├── 2013
 │   ├── 131A (82719, 512, 512) float32
 │   ├── 1600A (83001, 512, 512) float32
 │   ├── 1700A (74989, 512, 512) float32
 │   ├── 171A (82633, 512, 512) float32
 │   ├── 193A (82716, 512, 512) float32
 │   ├── 211A (82746, 512, 512) float32
 │   ├── 304A (82715, 512, 512) float32
 │   ├── 335A (82723, 512, 512) float32
 │   └── 94A (82746, 512, 512) float32
 └── 2014
     ├── 131A (73605, 512, 512) float32
     ├── 1600A (73390, 512, 512) float32
     ├── 1700A (66326, 512, 512) float32
     ├── 171A (73487, 512, 512) float32
     ├── 193A (73603, 512, 512) float32
     ├── 211A (73617, 512, 512) float32
     ├── 304A (73602, 512, 512) float32
     ├── 335A (73604, 512, 512) float32
     └── 94A (73618, 512, 512) float32
CPU times: user 1min 11s, sys: 1.9 s, total: 1min 13s
Wall time: 6min 14s

Top Results From Across the Web

Convenience functions (zarr.convenience) — zarr 2.13.3 ...

Convenience function to save an array or group of arrays to the local file system. Parameters. storeMutableMapping or string. Store or path to...

NeXus Tree API Modules — NeXpy 1.0.0rc1 documentation

Value, shape, dtype, and attributes of the field ... NXfields usually consist of arrays of numeric data with associated meta-data, the NeXus attributes....

NumPy Internals: An Introduction - Towards Data Science

By changing the metadata it is possible to change the shape, transpose or slice an array without rearranging the raw data. The data...

How to keep column names when converting from pandas to ...

Really, I'd just like to maintain the column_name meta data for arrays passed through a deep tree of sci-kit predictors. Its interface's .fit(X, ......

IO tools (text, CSV, HDF5, …) — pandas 1.5.2 documentation

Number of lines at bottom of file to skip (unsupported with engine='c'). ... preservation of metadata including but not limited to dtypes and...