question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Error saving results in run_regression

See original GitHub issue

Hi,

Thanks very much for developing this nice tool.

I am following the following tutorial to estimate the cell type signatures from my own scRNAseq dataset. I am having an error while calling to the run_regression function. Everything looks fine, the epoch VS ELBO loss plots are generated as well as the ones for the UMI counts. However, I am having this error when saving the results:

### Saving results ###
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
~/.conda/envs/cellpymc/lib/python3.7/site-packages/anndata/_io/utils.py in func_wrapper(elem, key, val, *args, **kwargs)
    187         try:
--> 188             return func(elem, key, val, *args, **kwargs)
    189         except Exception as e:

~/.conda/envs/cellpymc/lib/python3.7/site-packages/anndata/_io/h5ad.py in write_dataframe(f, key, df, dataset_kwargs)
    257     group.attrs["encoding-version"] = EncodingVersions.dataframe.value
--> 258     group.attrs["column-order"] = list(df.columns)
    259 

h5py/_objects.pyx in h5py._objects.with_phil.wrapper()

h5py/_objects.pyx in h5py._objects.with_phil.wrapper()

~/.conda/envs/cellpymc/lib/python3.7/site-packages/h5py/_hl/attrs.py in __setitem__(self, name, value)
    102         """
--> 103         self.create(name, data=value)
    104 

~/.conda/envs/cellpymc/lib/python3.7/site-packages/h5py/_hl/attrs.py in create(self, name, data, shape, dtype)
    196             try:
--> 197                 attr = h5a.create(self._id, self._e(tempname), htype, space)
    198             except:

h5py/_objects.pyx in h5py._objects.with_phil.wrapper()

h5py/_objects.pyx in h5py._objects.with_phil.wrapper()

h5py/h5a.pyx in h5py.h5a.create()

RuntimeError: Unable to create attribute (object header message is too large)

The above exception was the direct cause of the following exception:

RuntimeError                              Traceback (most recent call last)
<ipython-input-601-cf8487f6980c> in <module>
     30                    export_args={'path': results_folder + 'regression_model/', # where to save results
     31                                 'save_model': True, #save pytorch model?
---> 32                                 'run_name_suffix': ''})
     33 
     34 reg_mod = r['mod']

~/.conda/envs/cellpymc/lib/python3.7/site-packages/cell2location/run_regression.py in run_regression(sc_data, model_name, verbose, return_all, train_args, model_kwargs, posterior_args, export_args)
    325 
    326     # save anndata with exported posterior
--> 327     sc_data.write(filename=path + 'sc.h5ad', compression='gzip')
    328 
    329     # save model object and related annotations

~/.conda/envs/cellpymc/lib/python3.7/site-packages/anndata/_core/anndata.py in write_h5ad(self, filename, compression, compression_opts, force_dense, as_dense)
   1848             compression_opts=compression_opts,
   1849             force_dense=force_dense,
-> 1850             as_dense=as_dense,
   1851         )
   1852 

~/.conda/envs/cellpymc/lib/python3.7/site-packages/anndata/_io/h5ad.py in write_h5ad(filepath, adata, force_dense, as_dense, dataset_kwargs, **kwargs)
    115             )
    116         else:
--> 117             write_attribute(f, "raw", adata.raw, dataset_kwargs=dataset_kwargs)
    118         write_attribute(f, "obs", adata.obs, dataset_kwargs=dataset_kwargs)
    119         write_attribute(f, "var", adata.var, dataset_kwargs=dataset_kwargs)

~/.conda/envs/cellpymc/lib/python3.7/functools.py in wrapper(*args, **kw)
    838                             '1 positional argument')
    839 
--> 840         return dispatch(args[0].__class__)(*args, **kw)
    841 
    842     funcname = getattr(func, '__name__', 'singledispatch function')

~/.conda/envs/cellpymc/lib/python3.7/site-packages/anndata/_io/h5ad.py in write_attribute_h5ad(f, key, value, *args, **kwargs)
    137     if key in f:
    138         del f[key]
--> 139     _write_method(type(value))(f, key, value, *args, **kwargs)
    140 
    141 

~/.conda/envs/cellpymc/lib/python3.7/site-packages/anndata/_io/h5ad.py in write_raw(f, key, value, dataset_kwargs)
    146     group.attrs["shape"] = value.shape
    147     write_attribute(f, "raw/X", value.X, dataset_kwargs=dataset_kwargs)
--> 148     write_attribute(f, "raw/var", value.var, dataset_kwargs=dataset_kwargs)
    149     write_attribute(f, "raw/varm", value.varm, dataset_kwargs=dataset_kwargs)
    150 

~/.conda/envs/cellpymc/lib/python3.7/functools.py in wrapper(*args, **kw)
    838                             '1 positional argument')
    839 
--> 840         return dispatch(args[0].__class__)(*args, **kw)
    841 
    842     funcname = getattr(func, '__name__', 'singledispatch function')

~/.conda/envs/cellpymc/lib/python3.7/site-packages/anndata/_io/h5ad.py in write_attribute_h5ad(f, key, value, *args, **kwargs)
    137     if key in f:
    138         del f[key]
--> 139     _write_method(type(value))(f, key, value, *args, **kwargs)
    140 
    141 

~/.conda/envs/cellpymc/lib/python3.7/site-packages/anndata/_io/utils.py in func_wrapper(elem, key, val, *args, **kwargs)
    193                 f"Above error raised while writing key {key!r} of {type(elem)}"
    194                 f" from {parent}."
--> 195             ) from e
    196 
    197     return func_wrapper

RuntimeError: Unable to create attribute (object header message is too large)

Above error raised while writing key 'raw/var' of <class 'h5py._hl.files.File'> from /.

So far, I have found something like this:

HDF5 has a header limit of 64kb for all metadata of the columns. This include name, types, etc. When you go about roughly 2000 columns, you will run out of space to store all the metadata. This is a fundamental limitation of pytables. I don’t think they will make workarounds on their side any time soon. You will either have to split the table up or choose another storage format.

from: https://stackoverflow.com/questions/16639503/unable-to-save-dataframe-to-hdf5-object-header-message-is-too-large

Do you have any idea why the object header could become so long?

Best regards and thank you very much, Alberto.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:6 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
JZLcommented, Feb 11, 2021

Hi

Not a c2l dev but had similar issues. The way it’s designed, each cell type in the reference gets multiple accompanying columns so the number of columns can increase very rapidly with number of cell types. I don’t think the designers of h5ad anticipated that 😃

I also found the h5ad export slow and sometimes prone to crashing for large files so I ended up using joblib.dump (here) (a variant of pickle.dump but with the additional benefit of compression, which helps a lot for file size). I can share the code but in run_c2l.py, wherever you see write you can use joblib.dump for the export

Similarly, some of the .csv files it exports can be useful but also slow down a lot for large spatial data/many cell types (it’s also not compressed) so if you’re also running large numbers of cell types, it might be worth removing that part of the code, and regenerating as needed

For what it’s worth, I’ve found it very handy to run a small 5-iteration model all the way through for new data first, before doing the full model, so it doesn’t train the whole thing and then quit before saving

Oh just to add this, if you go with joblib exporting, you’ll also have to change the importing steps to go from reading the h5ad to joblib.load

Read more comments on GitHub >

github_iconTop Results From Across the Web

saving regression outputs for many regressions by group
Hi all, I have stock data, so stock permno and date identifies an observation. There are multiple events, and for each event, ...
Read more >
How to export regression results in Stata using Outreg2
This video explains how to export the regression results of Stata to a word or excel software. There are different options of this...
Read more >
Chapter 8. Regression Basics - BC Open Textbooks
If the slope equals zero, then changes in x do not result in any change in y. Formally, for each independent variable, you...
Read more >
Linear regression analysis in Excel - Ablebits
The tutorial explains the basics of regression analysis and shows how to do linear regression in Excel with Analysis ToolPak and formulas.
Read more >
Saving and accessing results from regression in a loop
I am trying to do several panel data regression through the pml package in a for loop and then save the results, so...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found