Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

sc.pp.highly_variable_genes: overflow encountered

See original GitHub issue

Env:

Ubuntu 16.04
python 3.7
pandas 0.25.0
scanpy 1.4.4.post1

I have an AnnData object called adata. The maximum value in the count matrix adata.X is 3701.

When I do sc.pp.highly_variable_genes(adata)

I get the following error

/home/sfleming/anaconda3/envs/scvi/lib/python3.7/site-packages/scipy/sparse/data.py:132: RuntimeWarning: overflow encountered in expm1
  result = op(self._deduped_data())
/home/sfleming/anaconda3/envs/scvi/lib/python3.7/site-packages/scipy/sparse/data.py:132: RuntimeWarning: invalid value encountered in expm1
  result = op(self._deduped_data())
/home/sfleming/anaconda3/envs/scvi/lib/python3.7/site-packages/scanpy/preprocessing/_utils.py:18: RuntimeWarning: overflow encountered in square
  var = (mean_sq - mean**2) * (X.shape[0]/(X.shape[0]-1))
/home/sfleming/anaconda3/envs/scvi/lib/python3.7/site-packages/scanpy/preprocessing/_utils.py:18: RuntimeWarning: invalid value encountered in subtract
  var = (mean_sq - mean**2) * (X.shape[0]/(X.shape[0]-1))
/home/sfleming/anaconda3/envs/scvi/lib/python3.7/site-packages/scanpy/preprocessing/_highly_variable_genes.py:86: RuntimeWarning: overflow encountered in log1p
  mean = np.log1p(mean)
/home/sfleming/anaconda3/envs/scvi/lib/python3.7/site-packages/scanpy/preprocessing/_highly_variable_genes.py:86: RuntimeWarning: invalid value encountered in log1p
  mean = np.log1p(mean)
Traceback (most recent call last):
  File "../../scvi/scvi_adata.py", line 75, in <module>
    sc.pp.highly_variable_genes(adata)
  File "/home/sfleming/anaconda3/envs/scvi/lib/python3.7/site-packages/scanpy/preprocessing/_highly_variable_genes.py", line 257, in highly_variable_genes
    flavor=flavor)
  File "/home/sfleming/anaconda3/envs/scvi/lib/python3.7/site-packages/scanpy/preprocessing/_highly_variable_genes.py", line 92, in _highly_variable_genes_single_batch
    df['mean_bin'] = pd.cut(df['means'], bins=n_bins)
  File "/home/sfleming/anaconda3/envs/scvi/lib/python3.7/site-packages/pandas/core/reshape/tile.py", line 233, in cut
    "cannot specify integer `bins` when input data " "contains infinity"
ValueError: cannot specify integer `bins` when input data contains infinity

Indeed, if I do np.expm1(3701) I get an overflow.

I think it will be necessary to come up with a way to calculate highly variable genes without doing expm1 on the raw counts, due to this overflow issue.

Issue Analytics

State:
Created 4 years ago
Comments:5 (3 by maintainers)

Top GitHub Comments

8reactions

sjflemingcommented, Aug 2, 2019

Sorry, just realizing that this function expects logarithmized data. My fault.

6reactions

maximilianhcommented, Aug 3, 2019

Still the error message could be a lot better. I’ve made the same mistake, it’s easy to forget to log the data.

On Fri 2 Aug 2019 at 23:36, Stephen Fleming notifications@github.com wrote:

Closed #763 https://github.com/theislab/scanpy/issues/763.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/theislab/scanpy/issues/763?email_source=notifications&email_token=AACL4TL6QHUQMHIBKEQT5GLQCSSFFA5CNFSM4IJBAFAKYY3PNVWWK3TUL52HS4DFWZEXG43VMVCXMZLOORHG65DJMZUWGYLUNFXW5KTDN5WW2ZLOORPWSZGOS3M3XBA#event-2530851716, or mute the thread https://github.com/notifications/unsubscribe-auth/AACL4TM5VZDC544TAQPK7NDQCSSFFANCNFSM4IJBAFAA .