Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Error saving results in run_regression

See original GitHub issue

Hi,

Thanks very much for developing this nice tool.

I am following the following tutorial to estimate the cell type signatures from my own scRNAseq dataset. I am having an error while calling to the run_regression function. Everything looks fine, the epoch VS ELBO loss plots are generated as well as the ones for the UMI counts. However, I am having this error when saving the results:

### Saving results ###
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
~/.conda/envs/cellpymc/lib/python3.7/site-packages/anndata/_io/utils.py in func_wrapper(elem, key, val, *args, **kwargs)
    187         try:
--> 188             return func(elem, key, val, *args, **kwargs)
    189         except Exception as e:

~/.conda/envs/cellpymc/lib/python3.7/site-packages/anndata/_io/h5ad.py in write_dataframe(f, key, df, dataset_kwargs)
    257     group.attrs["encoding-version"] = EncodingVersions.dataframe.value
--> 258     group.attrs["column-order"] = list(df.columns)
    259 

h5py/_objects.pyx in h5py._objects.with_phil.wrapper()

h5py/_objects.pyx in h5py._objects.with_phil.wrapper()

~/.conda/envs/cellpymc/lib/python3.7/site-packages/h5py/_hl/attrs.py in __setitem__(self, name, value)
    102         """
--> 103         self.create(name, data=value)
    104 

~/.conda/envs/cellpymc/lib/python3.7/site-packages/h5py/_hl/attrs.py in create(self, name, data, shape, dtype)
    196             try:
--> 197                 attr = h5a.create(self._id, self._e(tempname), htype, space)
    198             except:

h5py/_objects.pyx in h5py._objects.with_phil.wrapper()

h5py/_objects.pyx in h5py._objects.with_phil.wrapper()

h5py/h5a.pyx in h5py.h5a.create()

RuntimeError: Unable to create attribute (object header message is too large)

The above exception was the direct cause of the following exception:

RuntimeError                              Traceback (most recent call last)
<ipython-input-601-cf8487f6980c> in <module>
     30                    export_args={'path': results_folder + 'regression_model/', # where to save results
     31                                 'save_model': True, #save pytorch model?
---> 32                                 'run_name_suffix': ''})
     33 
     34 reg_mod = r['mod']

~/.conda/envs/cellpymc/lib/python3.7/site-packages/cell2location/run_regression.py in run_regression(sc_data, model_name, verbose, return_all, train_args, model_kwargs, posterior_args, export_args)
    325 
    326     # save anndata with exported posterior
--> 327     sc_data.write(filename=path + 'sc.h5ad', compression='gzip')
    328 
    329     # save model object and related annotations

~/.conda/envs/cellpymc/lib/python3.7/site-packages/anndata/_core/anndata.py in write_h5ad(self, filename, compression, compression_opts, force_dense, as_dense)
   1848             compression_opts=compression_opts,
   1849             force_dense=force_dense,
-> 1850             as_dense=as_dense,
   1851         )
   1852 

~/.conda/envs/cellpymc/lib/python3.7/site-packages/anndata/_io/h5ad.py in write_h5ad(filepath, adata, force_dense, as_dense, dataset_kwargs, **kwargs)
    115             )
    116         else:
--> 117             write_attribute(f, "raw", adata.raw, dataset_kwargs=dataset_kwargs)
    118         write_attribute(f, "obs", adata.obs, dataset_kwargs=dataset_kwargs)
    119         write_attribute(f, "var", adata.var, dataset_kwargs=dataset_kwargs)

~/.conda/envs/cellpymc/lib/python3.7/functools.py in wrapper(*args, **kw)
    838                             '1 positional argument')
    839 
--> 840         return dispatch(args[0].__class__)(*args, **kw)
    841 
    842     funcname = getattr(func, '__name__', 'singledispatch function')

~/.conda/envs/cellpymc/lib/python3.7/site-packages/anndata/_io/h5ad.py in write_attribute_h5ad(f, key, value, *args, **kwargs)
    137     if key in f:
    138         del f[key]
--> 139     _write_method(type(value))(f, key, value, *args, **kwargs)
    140 
    141 

~/.conda/envs/cellpymc/lib/python3.7/site-packages/anndata/_io/h5ad.py in write_raw(f, key, value, dataset_kwargs)
    146     group.attrs["shape"] = value.shape
    147     write_attribute(f, "raw/X", value.X, dataset_kwargs=dataset_kwargs)
--> 148     write_attribute(f, "raw/var", value.var, dataset_kwargs=dataset_kwargs)
    149     write_attribute(f, "raw/varm", value.varm, dataset_kwargs=dataset_kwargs)
    150 

~/.conda/envs/cellpymc/lib/python3.7/functools.py in wrapper(*args, **kw)
    838                             '1 positional argument')
    839 
--> 840         return dispatch(args[0].__class__)(*args, **kw)
    841 
    842     funcname = getattr(func, '__name__', 'singledispatch function')

~/.conda/envs/cellpymc/lib/python3.7/site-packages/anndata/_io/h5ad.py in write_attribute_h5ad(f, key, value, *args, **kwargs)
    137     if key in f:
    138         del f[key]
--> 139     _write_method(type(value))(f, key, value, *args, **kwargs)
    140 
    141 

~/.conda/envs/cellpymc/lib/python3.7/site-packages/anndata/_io/utils.py in func_wrapper(elem, key, val, *args, **kwargs)
    193                 f"Above error raised while writing key {key!r} of {type(elem)}"
    194                 f" from {parent}."
--> 195             ) from e
    196 
    197     return func_wrapper

RuntimeError: Unable to create attribute (object header message is too large)

Above error raised while writing key 'raw/var' of <class 'h5py._hl.files.File'> from /.

So far, I have found something like this:

HDF5 has a header limit of 64kb for all metadata of the columns. This include name, types, etc. When you go about roughly 2000 columns, you will run out of space to store all the metadata. This is a fundamental limitation of pytables. I don’t think they will make workarounds on their side any time soon. You will either have to split the table up or choose another storage format.

from: https://stackoverflow.com/questions/16639503/unable-to-save-dataframe-to-hdf5-object-header-message-is-too-large

Do you have any idea why the object header could become so long?

Best regards and thank you very much, Alberto.

Issue Analytics

State:
Created 3 years ago
Comments:6 (3 by maintainers)

Top GitHub Comments

1reaction

JZLcommented, Feb 11, 2021

Not a c2l dev but had similar issues. The way it’s designed, each cell type in the reference gets multiple accompanying columns so the number of columns can increase very rapidly with number of cell types. I don’t think the designers of h5ad anticipated that 😃

I also found the h5ad export slow and sometimes prone to crashing for large files so I ended up using joblib.dump (here) (a variant of pickle.dump but with the additional benefit of compression, which helps a lot for file size). I can share the code but in run_c2l.py, wherever you see write you can use joblib.dump for the export

Similarly, some of the .csv files it exports can be useful but also slow down a lot for large spatial data/many cell types (it’s also not compressed) so if you’re also running large numbers of cell types, it might be worth removing that part of the code, and regenerating as needed

For what it’s worth, I’ve found it very handy to run a small 5-iteration model all the way through for new data first, before doing the full model, so it doesn’t train the whole thing and then quit before saving

Oh just to add this, if you go with joblib exporting, you’ll also have to change the importing steps to go from reading the h5ad to joblib.load

0reactions

vitklcommented, Mar 4, 2021

Added informative errors in https://github.com/BayraktarLab/cell2location/commit/d4c4e7f0407af6c9c19a64334e094dafcfe3114e and https://github.com/BayraktarLab/cell2location/commit/8eec6da0f7c6c4ef2025c8c0682af6df90e45312