question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Support for nullable bool, int in dataframes

See original GitHub issue

What needs to happen

Support for nullable dtypes during IO. Allow for writing pandas string, integer, and boolean arrays (which can have null values) by saving a “null” mask along with them.

Example

import anndata as ad, pandas as pd, numpy as np

a = ad.AnnData(np.ones((3, 3)))

# Works fine
a.obs["np_bool"] = np.zeros(3, dtype=bool)
a.write("tmp.h5ad")

# Errors at write
a.obs["pd_bool"] = a.obs["np_bool"].astype(pd.BooleanDtype())
a.write("tmp.h5ad")
TypeError: Object dtype dtype('O') has no native HDF5 equivalent

Above error raised while writing key 'pd_bool' of <class 'h5py._hl.group.Group'> from /.

Above error raised while writing key 'obs' of <class 'h5py._hl.files.File'> from /.
Full traceback
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
~/github/anndata/anndata/_io/utils.py in func_wrapper(elem, key, val, *args, **kwargs)
    208         try:
--> 209             return func(elem, key, val, *args, **kwargs)
    210         except Exception as e:

~/github/anndata/anndata/_io/h5ad.py in write_series(group, key, series, dataset_kwargs)
    290     else:
--> 291         group[key] = series.values
    292 

/usr/local/lib/python3.8/site-packages/h5py/_hl/group.py in __setitem__(self, name, obj)
    410             else:
--> 411                 ds = self.create_dataset(None, data=obj)
    412                 h5o.link(ds.id, self.id, name, lcpl=lcpl)

/usr/local/lib/python3.8/site-packages/h5py/_hl/group.py in create_dataset(self, name, shape, dtype, data, **kwds)
    147 
--> 148             dsid = dataset.make_new_dset(group, shape, dtype, data, name, **kwds)
    149             dset = dataset.Dataset(dsid)

/usr/local/lib/python3.8/site-packages/h5py/_hl/dataset.py in make_new_dset(parent, shape, dtype, data, name, chunks, compression, shuffle, fletcher32, maxshape, compression_opts, fillvalue, scaleoffset, track_times, external, track_order, dcpl, allow_unknown_filter)
     88             dtype = numpy.dtype(dtype)
---> 89         tid = h5t.py_create(dtype, logical=1)
     90 

h5py/h5t.pyx in h5py.h5t.py_create()

h5py/h5t.pyx in h5py.h5t.py_create()

h5py/h5t.pyx in h5py.h5t.py_create()

TypeError: Object dtype dtype('O') has no native HDF5 equivalent

The above exception was the direct cause of the following exception:

TypeError                                 Traceback (most recent call last)
~/github/anndata/anndata/_io/utils.py in func_wrapper(elem, key, val, *args, **kwargs)
    208         try:
--> 209             return func(elem, key, val, *args, **kwargs)
    210         except Exception as e:

~/github/anndata/anndata/_io/h5ad.py in write_dataframe(f, key, df, dataset_kwargs)
    264     for col_name, (_, series) in zip(col_names, df.items()):
--> 265         write_series(group, col_name, series, dataset_kwargs=dataset_kwargs)
    266 

~/github/anndata/anndata/_io/utils.py in func_wrapper(elem, key, val, *args, **kwargs)
    211             parent = _get_parent(elem)
--> 212             raise type(e)(
    213                 f"{e}\n\n"

TypeError: Object dtype dtype('O') has no native HDF5 equivalent

Above error raised while writing key 'pd_bool' of <class 'h5py._hl.group.Group'> from /.

The above exception was the direct cause of the following exception:

TypeError                                 Traceback (most recent call last)
<ipython-input-13-32812d0f937a> in <module>
      1 a.obs["pd_bool"] = a.obs["np_bool"].astype(pd.BooleanDtype())
----> 2 a.write("tmp.h5ad")

~/github/anndata/anndata/_core/anndata.py in write_h5ad(self, filename, compression, compression_opts, force_dense, as_dense)
   1877             filename = self.filename
   1878 
-> 1879         _write_h5ad(
   1880             Path(filename),
   1881             self,

~/github/anndata/anndata/_io/h5ad.py in write_h5ad(filepath, adata, force_dense, as_dense, dataset_kwargs, **kwargs)
    109         else:
    110             write_attribute(f, "raw", adata.raw, dataset_kwargs=dataset_kwargs)
--> 111         write_attribute(f, "obs", adata.obs, dataset_kwargs=dataset_kwargs)
    112         write_attribute(f, "var", adata.var, dataset_kwargs=dataset_kwargs)
    113         write_attribute(f, "obsm", adata.obsm, dataset_kwargs=dataset_kwargs)

/usr/local/Cellar/python@3.8/3.8.6_2/Frameworks/Python.framework/Versions/3.8/lib/python3.8/functools.py in wrapper(*args, **kw)
    873                             '1 positional argument')
    874 
--> 875         return dispatch(args[0].__class__)(*args, **kw)
    876 
    877     funcname = getattr(func, '__name__', 'singledispatch function')

~/github/anndata/anndata/_io/h5ad.py in write_attribute_h5ad(f, key, value, *args, **kwargs)
    130     if key in f:
    131         del f[key]
--> 132     _write_method(type(value))(f, key, value, *args, **kwargs)
    133 
    134 

~/github/anndata/anndata/_io/utils.py in func_wrapper(elem, key, val, *args, **kwargs)
    210         except Exception as e:
    211             parent = _get_parent(elem)
--> 212             raise type(e)(
    213                 f"{e}\n\n"
    214                 f"Above error raised while writing key {key!r} of {type(elem)}"

TypeError: Object dtype dtype('O') has no native HDF5 equivalent

Above error raised while writing key 'pd_bool' of <class 'h5py._hl.group.Group'> from /.

Above error raised while writing key 'obs' of <class 'h5py._hl.files.File'> from /.

I have a report from the wild of writing working here, but reading (by cellxgene) failing.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:10 (5 by maintainers)

github_iconTop GitHub Comments

5reactions
vitklcommented, Dec 14, 2021

I generally convert all problematic variables to strings obs['x'].astype(str)

3reactions
vitklcommented, Nov 17, 2021

I am wondering what’s the progress on this issue. It is very annoying when analysis results don’t get saved after several hours of work on HPC because a new column popped up with unsave-able object type (in a script that worked just fine the other day, e.g. no need to test for save-ability). So I would really appreciate if this is addressed.

Maybe you can do a temporary workaround that converts such objects to strings with a warning?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Support for nullable bool, int in dataframes · Issue #504 - GitHub
Support for nullable dtypes during IO. Allow for writing pandas string, integer, and boolean arrays (which can have null values) by saving a...
Read more >
Is there a nullable boolean type I can use in a Pandas ...
Python's built-in bool class cannot have a Null value. It can only be True or False. And in this case, because bool(None)==False the...
Read more >
Nullable Boolean data type — pandas 1.5.2 documentation
pandas allows indexing with NA values in a boolean array, which are treated as False . Changed in version 1.0. 2. If you...
Read more >
Dealing with null in Spark - MungingData
Let's create a DataFrame with a name column that isn't nullable and an age column that is nullable. The name column cannot take...
Read more >
Towards consistent missing value handling in Pandas
Also boolean data (in addition to integer data) do not support ... of a pandas DataFrame, has no built-in support for missing values....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found