question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Combat populates adata.X with NANs so sc.pp.highly_variable_genes function outputs error

See original GitHub issue

After running sc.pp.combat(adata, key='sample'), adata.X is full of NaNs and sc.pp.highly_variagle_genes(adata) fails. No issues (and no NaNs) if NOT running the combat correction.

sc.pp.combat(adata, key='sample')
sc.pp.highly_variable_genes(adata)
In [1]: sc.pp.combat(adata, key='sample')
/home/auesro/anaconda3/envs/Scanpy/lib/python3.7/site-packages/anndata/_core/anndata.py:21: FutureWarning: pandas.core.index is deprecated and will be removed in a future version.  The public classes are available in the top-level namespace.
  from pandas.core.index import RangeIndex
/home/auesro/anaconda3/envs/Scanpy/lib/python3.7/site-packages/sklearn/externals/six.py:31: FutureWarning: The module is deprecated in version 0.21 and will be removed in version 0.23 since we've dropped support for Python 2.7. Please rely on the official version of six (https://pypi.org/project/six/).
  "(https://pypi.org/project/six/).", FutureWarning)
scanpy==1.4.6 anndata==0.7.1 umap==0.4.1 numpy==1.18.1 scipy==1.4.1 pandas==1.0.3 scikit-learn==0.22.2.post1 statsmodels==0.11.1 python-igraph==0.8.0
Standardizing Data across genes.

Found 11 batches

Found 0 numerical variables:
	

Found 3 genes with zero variance.
Fitting L/S model and finding priors

Finding parametric adjustments

Adjusting data

/home/auesro/anaconda3/envs/Scanpy/lib/python3.7/site-packages/scanpy/preprocessing/_combat.py:338: RuntimeWarning: invalid value encountered in true_divide
  change = max((abs(g_new - g_old) / g_old).max(), (abs(d_new - d_old) / d_old).max())
/home/auesro/anaconda3/envs/Scanpy/lib/python3.7/site-packages/scanpy/preprocessing/_combat.py:338: RuntimeWarning: divide by zero encountered in true_divide
  change = max((abs(g_new - g_old) / g_old).max(), (abs(d_new - d_old) / d_old).max())

In [2]: sc.pp.highly_variable_genes(adata)
extracting highly variable genes
Traceback (most recent call last):

  File "<ipython-input-2-7727f5f928cd>", line 1, in <module>
    sc.pp.highly_variable_genes(adata)

  File "/home/auesro/anaconda3/envs/Scanpy/lib/python3.7/site-packages/scanpy/preprocessing/_highly_variable_genes.py", line 235, in highly_variable_genes
    flavor=flavor,

  File "/home/auesro/anaconda3/envs/Scanpy/lib/python3.7/site-packages/scanpy/preprocessing/_highly_variable_genes.py", line 65, in _highly_variable_genes_single_batch
    df['mean_bin'] = pd.cut(df['means'], bins=n_bins)

  File "/home/auesro/anaconda3/envs/Scanpy/lib/python3.7/site-packages/pandas/core/reshape/tile.py", line 265, in cut
    duplicates=duplicates,

  File "/home/auesro/anaconda3/envs/Scanpy/lib/python3.7/site-packages/pandas/core/reshape/tile.py", line 381, in _bins_to_cuts
    f"Bin edges must be unique: {repr(bins)}.\n"

ValueError: Bin edges must be unique: array([nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan]).
You can drop duplicate edges by setting the 'duplicates' kwarg

Versions:

scanpy==1.4.6 anndata==0.7.1 umap==0.4.1 numpy==1.18.1 scipy==1.4.1 pandas==1.0.3 scikit-learn==0.22.2.post1 statsmodels==0.11.1 python-igraph==0.8.0

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:13 (5 by maintainers)

github_iconTop GitHub Comments

2reactions
gokceneraslancommented, Apr 20, 2020

I think it’s a nice corner case we should handle. Can you file another bug about having 1cell batches in combat or highly_variable_genes with batch_key option?

1reaction
auesrocommented, Apr 21, 2020

@LuckyMD of course your are right… @gokceneraslan sure, I can, it would be nice just to get a warning or have Combat halt the processing on first checking that there is a batch with just 1 cell.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Combat populates adata.X with NANs so sc.pp ... - GitHub
After running sc.pp.combat(adata, key='sample'), adata.X is full ... X with NANs so sc.pp.highly_variable_genes function outputs error #1172.
Read more >
Combat function causing downstream error - Help - Scanpy
After running combat function for batch correct, the highly variable genes function throws an error sc.pp.combat(adata) sc.pp.highly_variable_genes(adata, ...
Read more >
scanpy.pp.highly_variable_genes - Read the Docs
First, the data are standardized (i.e., z-score normalization per feature) with a regularized standard deviation. Next, the normalized variance is computed as ...
Read more >
scVI Documentation
scVI is a package for end-to-end analysis of single-cell omics data. The package is composed of several deep genera-.
Read more >
Proceedings - JOBIM 2022
Discovery of potential functional paths by integration of phospho-proteomics data in the PPI network using a RWR framework (Proceedings) [Jeremie PERRIN, ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found