question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

can't have numpy datatypes in attributes

See original GitHub issue

We are working on the zarr backend for XArray (pydata/xarray#1528). XArray likes to put all kinds of weird stuff into attributes, including numpy datatypes and even numpy arrays. This is because the netCDF data model allows attributes to have all of the same types as variables.

Instead, in zarr, the attributes have to be json-serializable. So this doesn’t work:

za = zarr.create(shape=(1), store='tmp_file')
za.attrs['foo'] = np.float32(0)

It raises TypeError: Object of type 'float32' is not JSON serializable.

We will need some sort of workaround for this in order to make zarr work as a store for xarray.

Issue Analytics

  • State:open
  • Created 6 years ago
  • Comments:11 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
chairmankcommented, Jun 21, 2018

I would like to store as attributes any of the data types described in the “Data Type Encoding” section of the Zarr specification.

Specifically, in my real-world usage, I have encountered inconvenience with attribute values that are

  • datetime64 and timedelta
  • Floating-point numbers that I need to represent with exact precision (e.g. f8 versus f4), which JSON doesn’t distinguish
    • A special problem is NaN, which has an exact representation as a Zarr/NumPy floating-point value but can not be represented by JSON
  • Structs like [('R','u1'), ('G','u1'), ('B','u1'), ('A','u1')]

I am also excited by the possibility of storing attributes that are arbitrary objects, such as JSON documents, although I haven’t expressly encountered this requirement yet.

It is worth noting that, in NetCDF, attribute values are really 1-dimensional arrays:

An attribute has an associated variable (the null “global variable” for a global or group-level attribute), a name, a data type, a length, and a value. The current version treats all attributes as vectors; scalar values are treated as single-element vectors.

0reactions
miccolicommented, Aug 6, 2022

I was recently hit by this very same problem, with reference to HDF5 files, which also allow for array attributes.

For example from h5dump I have

         ATTRIBUTE "data channels" {
            DATATYPE  H5T_STD_I64LE
            DATASPACE  SIMPLE { ( 4 ) / ( 4 ) }
            DATA {
            (0): 1, 2, 3, 4
            }
         }
         ATTRIBUTE "data units" {
            DATATYPE  H5T_STRING {
               STRSIZE 8;
               STRPAD H5T_STR_NULLPAD;
               CSET H5T_CSET_ASCII;
               CTYPE H5T_C_S1;
            }
            DATASPACE  SIMPLE { ( 4 ) / ( 4 ) }
            DATA {
            (0): "N       ", "m/s^2   ", "m/s^2   ", "m/s^2   "
            }
         }

which are rendered by h5py as

>>> data.attrs['data channels']
array([1, 2, 3, 4])
>>> data.attrs['data units']
array([b'N       ', b'm/s^2   ', b'm/s^2   ', b'm/s^2   '], dtype='|S8')

When converting from HDF5 to ZARR, zarr.copy_all fails with

TypeError: Object of type ndarray is not JSON serializable

Since I have a bunch of files to convert I implemented a quick fix in miccoli/zarr-python@380ee7c07

I’m not sure if this is of general interest, but if there is enough interest I can open a PR.

Open question:

  • just hardcode the np.ndarray -> list mapping, or maybe better, allow the user to override the default JSONEncoder?

See also #933 and #533

Read more comments on GitHub >

github_iconTop Results From Across the Web

can't have numpy datatypes in attributes · Issue #156 - GitHub
This is because the netCDF data model allows attributes to have all of the same types as variables. ... It raises TypeError: Object...
Read more >
Data type objects (dtype) — NumPy v1.24 Manual
Structured data types are formed by creating a data type whose field contain other data types. Each field has a name by which...
Read more >
module 'numpy' has no attribute 'dtype' - python - Stack Overflow
I figured this out. The answer is that the file I was running was named numbers.py . This screws the whole thing up....
Read more >
NumPy Data Types - W3Schools
NumPy has some extra data types, and refer to data types with one ... A non integer string like 'a' can not be...
Read more >
NumPy - Data Types - Tutorialspoint
NumPy numerical types are instances of dtype (data-type) objects, each having unique characteristics. The dtypes are available as np.bool_, np.float32, etc.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found