Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Support for nullable bool, int in dataframes

See original GitHub issue

What needs to happen

Support for nullable dtypes during IO. Allow for writing pandas string, integer, and boolean arrays (which can have null values) by saving a “null” mask along with them.

Example

import anndata as ad, pandas as pd, numpy as np

a = ad.AnnData(np.ones((3, 3)))

# Works fine
a.obs["np_bool"] = np.zeros(3, dtype=bool)
a.write("tmp.h5ad")

# Errors at write
a.obs["pd_bool"] = a.obs["np_bool"].astype(pd.BooleanDtype())
a.write("tmp.h5ad")

TypeError: Object dtype dtype('O') has no native HDF5 equivalent

Above error raised while writing key 'pd_bool' of <class 'h5py._hl.group.Group'> from /.

Above error raised while writing key 'obs' of <class 'h5py._hl.files.File'> from /.

Full traceback

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
~/github/anndata/anndata/_io/utils.py in func_wrapper(elem, key, val, *args, **kwargs)
    208         try:
--> 209             return func(elem, key, val, *args, **kwargs)
    210         except Exception as e:

~/github/anndata/anndata/_io/h5ad.py in write_series(group, key, series, dataset_kwargs)
    290     else:
--> 291         group[key] = series.values
    292 

/usr/local/lib/python3.8/site-packages/h5py/_hl/group.py in __setitem__(self, name, obj)
    410             else:
--> 411                 ds = self.create_dataset(None, data=obj)
    412                 h5o.link(ds.id, self.id, name, lcpl=lcpl)

/usr/local/lib/python3.8/site-packages/h5py/_hl/group.py in create_dataset(self, name, shape, dtype, data, **kwds)
    147 
--> 148             dsid = dataset.make_new_dset(group, shape, dtype, data, name, **kwds)
    149             dset = dataset.Dataset(dsid)

/usr/local/lib/python3.8/site-packages/h5py/_hl/dataset.py in make_new_dset(parent, shape, dtype, data, name, chunks, compression, shuffle, fletcher32, maxshape, compression_opts, fillvalue, scaleoffset, track_times, external, track_order, dcpl, allow_unknown_filter)
     88             dtype = numpy.dtype(dtype)
---> 89         tid = h5t.py_create(dtype, logical=1)
     90 

h5py/h5t.pyx in h5py.h5t.py_create()

h5py/h5t.pyx in h5py.h5t.py_create()

h5py/h5t.pyx in h5py.h5t.py_create()

TypeError: Object dtype dtype('O') has no native HDF5 equivalent

The above exception was the direct cause of the following exception:

TypeError                                 Traceback (most recent call last)
~/github/anndata/anndata/_io/utils.py in func_wrapper(elem, key, val, *args, **kwargs)
    208         try:
--> 209             return func(elem, key, val, *args, **kwargs)
    210         except Exception as e:

~/github/anndata/anndata/_io/h5ad.py in write_dataframe(f, key, df, dataset_kwargs)
    264     for col_name, (_, series) in zip(col_names, df.items()):
--> 265         write_series(group, col_name, series, dataset_kwargs=dataset_kwargs)
    266 

~/github/anndata/anndata/_io/utils.py in func_wrapper(elem, key, val, *args, **kwargs)
    211             parent = _get_parent(elem)
--> 212             raise type(e)(
    213                 f"{e}\n\n"

TypeError: Object dtype dtype('O') has no native HDF5 equivalent

Above error raised while writing key 'pd_bool' of <class 'h5py._hl.group.Group'> from /.

The above exception was the direct cause of the following exception:

TypeError                                 Traceback (most recent call last)
<ipython-input-13-32812d0f937a> in <module>
      1 a.obs["pd_bool"] = a.obs["np_bool"].astype(pd.BooleanDtype())
----> 2 a.write("tmp.h5ad")

~/github/anndata/anndata/_core/anndata.py in write_h5ad(self, filename, compression, compression_opts, force_dense, as_dense)
   1877             filename = self.filename
   1878 
-> 1879         _write_h5ad(
   1880             Path(filename),
   1881             self,

~/github/anndata/anndata/_io/h5ad.py in write_h5ad(filepath, adata, force_dense, as_dense, dataset_kwargs, **kwargs)
    109         else:
    110             write_attribute(f, "raw", adata.raw, dataset_kwargs=dataset_kwargs)
--> 111         write_attribute(f, "obs", adata.obs, dataset_kwargs=dataset_kwargs)
    112         write_attribute(f, "var", adata.var, dataset_kwargs=dataset_kwargs)
    113         write_attribute(f, "obsm", adata.obsm, dataset_kwargs=dataset_kwargs)

/usr/local/Cellar/python@3.8/3.8.6_2/Frameworks/Python.framework/Versions/3.8/lib/python3.8/functools.py in wrapper(*args, **kw)
    873                             '1 positional argument')
    874 
--> 875         return dispatch(args[0].__class__)(*args, **kw)
    876 
    877     funcname = getattr(func, '__name__', 'singledispatch function')

~/github/anndata/anndata/_io/h5ad.py in write_attribute_h5ad(f, key, value, *args, **kwargs)
    130     if key in f:
    131         del f[key]
--> 132     _write_method(type(value))(f, key, value, *args, **kwargs)
    133 
    134 

~/github/anndata/anndata/_io/utils.py in func_wrapper(elem, key, val, *args, **kwargs)
    210         except Exception as e:
    211             parent = _get_parent(elem)
--> 212             raise type(e)(
    213                 f"{e}\n\n"
    214                 f"Above error raised while writing key {key!r} of {type(elem)}"

TypeError: Object dtype dtype('O') has no native HDF5 equivalent

Above error raised while writing key 'pd_bool' of <class 'h5py._hl.group.Group'> from /.

Above error raised while writing key 'obs' of <class 'h5py._hl.files.File'> from /.

I have a report from the wild of writing working here, but reading (by cellxgene) failing.

Issue Analytics

State:
Created 3 years ago
Comments:10 (5 by maintainers)

Top GitHub Comments

5reactions

vitklcommented, Dec 14, 2021

I generally convert all problematic variables to strings obs['x'].astype(str)

3reactions

vitklcommented, Nov 17, 2021

I am wondering what’s the progress on this issue. It is very annoying when analysis results don’t get saved after several hours of work on HPC because a new column popped up with unsave-able object type (in a script that worked just fine the other day, e.g. no need to test for save-ability). So I would really appreciate if this is addressed.

Maybe you can do a temporary workaround that converts such objects to strings with a warning?