question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

sc.pp.highly_variable_genes: overflow encountered

See original GitHub issue

Env:

  • Ubuntu 16.04
  • python 3.7
  • pandas 0.25.0
  • scanpy 1.4.4.post1

I have an AnnData object called adata. The maximum value in the count matrix adata.X is 3701.

When I do sc.pp.highly_variable_genes(adata)

I get the following error

/home/sfleming/anaconda3/envs/scvi/lib/python3.7/site-packages/scipy/sparse/data.py:132: RuntimeWarning: overflow encountered in expm1
  result = op(self._deduped_data())
/home/sfleming/anaconda3/envs/scvi/lib/python3.7/site-packages/scipy/sparse/data.py:132: RuntimeWarning: invalid value encountered in expm1
  result = op(self._deduped_data())
/home/sfleming/anaconda3/envs/scvi/lib/python3.7/site-packages/scanpy/preprocessing/_utils.py:18: RuntimeWarning: overflow encountered in square
  var = (mean_sq - mean**2) * (X.shape[0]/(X.shape[0]-1))
/home/sfleming/anaconda3/envs/scvi/lib/python3.7/site-packages/scanpy/preprocessing/_utils.py:18: RuntimeWarning: invalid value encountered in subtract
  var = (mean_sq - mean**2) * (X.shape[0]/(X.shape[0]-1))
/home/sfleming/anaconda3/envs/scvi/lib/python3.7/site-packages/scanpy/preprocessing/_highly_variable_genes.py:86: RuntimeWarning: overflow encountered in log1p
  mean = np.log1p(mean)
/home/sfleming/anaconda3/envs/scvi/lib/python3.7/site-packages/scanpy/preprocessing/_highly_variable_genes.py:86: RuntimeWarning: invalid value encountered in log1p
  mean = np.log1p(mean)
Traceback (most recent call last):
  File "../../scvi/scvi_adata.py", line 75, in <module>
    sc.pp.highly_variable_genes(adata)
  File "/home/sfleming/anaconda3/envs/scvi/lib/python3.7/site-packages/scanpy/preprocessing/_highly_variable_genes.py", line 257, in highly_variable_genes
    flavor=flavor)
  File "/home/sfleming/anaconda3/envs/scvi/lib/python3.7/site-packages/scanpy/preprocessing/_highly_variable_genes.py", line 92, in _highly_variable_genes_single_batch
    df['mean_bin'] = pd.cut(df['means'], bins=n_bins)
  File "/home/sfleming/anaconda3/envs/scvi/lib/python3.7/site-packages/pandas/core/reshape/tile.py", line 233, in cut
    "cannot specify integer `bins` when input data " "contains infinity"
ValueError: cannot specify integer `bins` when input data contains infinity

Indeed, if I do np.expm1(3701) I get an overflow.

I think it will be necessary to come up with a way to calculate highly variable genes without doing expm1 on the raw counts, due to this overflow issue.

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

8reactions
sjflemingcommented, Aug 2, 2019

Sorry, just realizing that this function expects logarithmized data. My fault.

6reactions
maximilianhcommented, Aug 3, 2019

Still the error message could be a lot better. I’ve made the same mistake, it’s easy to forget to log the data.

On Fri 2 Aug 2019 at 23:36, Stephen Fleming notifications@github.com wrote:

Closed #763 https://github.com/theislab/scanpy/issues/763.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/theislab/scanpy/issues/763?email_source=notifications&email_token=AACL4TL6QHUQMHIBKEQT5GLQCSSFFA5CNFSM4IJBAFAKYY3PNVWWK3TUL52HS4DFWZEXG43VMVCXMZLOORHG65DJMZUWGYLUNFXW5KTDN5WW2ZLOORPWSZGOS3M3XBA#event-2530851716, or mute the thread https://github.com/notifications/unsubscribe-auth/AACL4TM5VZDC544TAQPK7NDQCSSFFANCNFSM4IJBAFAA .

Read more comments on GitHub >

github_iconTop Results From Across the Web

sc.pp.highly_variable_genes: overflow encountered · Issue #763
Env: Ubuntu 16.04 python 3.7 pandas 0.25.0 scanpy 1.4.4.post1 I have an AnnData object called adata. The maximum value in the count matrix ......
Read more >
scanpy.pp.highly_variable_genes - Read the Docs
This means that for each bin of mean expression, highly variable genes are selected. For [Stuart19], a normalized variance for each gene is...
Read more >
scanpy highly variable genes - python - Stack Overflow
This is an issue with skmisc , according to this you should "try installing numpy+mkl before any other packages".
Read more >
scanpy.pp.highly_variable_genes and “raise KeyError” - scverse
Hi, I am using the data that was transformed from Seurat to Scanpy following the official guidence. Everything works fine.
Read more >
scanpy_03_integration
Detect variable genes in each dataset separately using the batch_key parameter. In [6]:. sc.pp.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found