Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Combat populates adata.X with NANs so sc.pp.highly_variable_genes function outputs error

See original GitHub issue

After running sc.pp.combat(adata, key='sample'), adata.X is full of NaNs and sc.pp.highly_variagle_genes(adata) fails. No issues (and no NaNs) if NOT running the combat correction.

sc.pp.combat(adata, key='sample')
sc.pp.highly_variable_genes(adata)

In [1]: sc.pp.combat(adata, key='sample')
/home/auesro/anaconda3/envs/Scanpy/lib/python3.7/site-packages/anndata/_core/anndata.py:21: FutureWarning: pandas.core.index is deprecated and will be removed in a future version.  The public classes are available in the top-level namespace.
  from pandas.core.index import RangeIndex
/home/auesro/anaconda3/envs/Scanpy/lib/python3.7/site-packages/sklearn/externals/six.py:31: FutureWarning: The module is deprecated in version 0.21 and will be removed in version 0.23 since we've dropped support for Python 2.7. Please rely on the official version of six (https://pypi.org/project/six/).
  "(https://pypi.org/project/six/).", FutureWarning)
scanpy==1.4.6 anndata==0.7.1 umap==0.4.1 numpy==1.18.1 scipy==1.4.1 pandas==1.0.3 scikit-learn==0.22.2.post1 statsmodels==0.11.1 python-igraph==0.8.0
Standardizing Data across genes.

Found 11 batches

Found 0 numerical variables:
	

Found 3 genes with zero variance.
Fitting L/S model and finding priors

Finding parametric adjustments

Adjusting data

/home/auesro/anaconda3/envs/Scanpy/lib/python3.7/site-packages/scanpy/preprocessing/_combat.py:338: RuntimeWarning: invalid value encountered in true_divide
  change = max((abs(g_new - g_old) / g_old).max(), (abs(d_new - d_old) / d_old).max())
/home/auesro/anaconda3/envs/Scanpy/lib/python3.7/site-packages/scanpy/preprocessing/_combat.py:338: RuntimeWarning: divide by zero encountered in true_divide
  change = max((abs(g_new - g_old) / g_old).max(), (abs(d_new - d_old) / d_old).max())

In [2]: sc.pp.highly_variable_genes(adata)
extracting highly variable genes
Traceback (most recent call last):

  File "<ipython-input-2-7727f5f928cd>", line 1, in <module>
    sc.pp.highly_variable_genes(adata)

  File "/home/auesro/anaconda3/envs/Scanpy/lib/python3.7/site-packages/scanpy/preprocessing/_highly_variable_genes.py", line 235, in highly_variable_genes
    flavor=flavor,

  File "/home/auesro/anaconda3/envs/Scanpy/lib/python3.7/site-packages/scanpy/preprocessing/_highly_variable_genes.py", line 65, in _highly_variable_genes_single_batch
    df['mean_bin'] = pd.cut(df['means'], bins=n_bins)

  File "/home/auesro/anaconda3/envs/Scanpy/lib/python3.7/site-packages/pandas/core/reshape/tile.py", line 265, in cut
    duplicates=duplicates,

  File "/home/auesro/anaconda3/envs/Scanpy/lib/python3.7/site-packages/pandas/core/reshape/tile.py", line 381, in _bins_to_cuts
    f"Bin edges must be unique: {repr(bins)}.\n"

ValueError: Bin edges must be unique: array([nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan]).
You can drop duplicate edges by setting the 'duplicates' kwarg

Versions:

scanpy==1.4.6 anndata==0.7.1 umap==0.4.1 numpy==1.18.1 scipy==1.4.1 pandas==1.0.3 scikit-learn==0.22.2.post1 statsmodels==0.11.1 python-igraph==0.8.0

Issue Analytics

State:
Created 3 years ago
Comments:13 (5 by maintainers)

Top GitHub Comments

2reactions

gokceneraslancommented, Apr 20, 2020

I think it’s a nice corner case we should handle. Can you file another bug about having 1cell batches in combat or highly_variable_genes with batch_key option?

1reaction

auesrocommented, Apr 21, 2020

@LuckyMD of course your are right… @gokceneraslan sure, I can, it would be nice just to get a warning or have Combat halt the processing on first checking that there is a batch with just 1 cell.

Top Results From Across the Web

Combat populates adata.X with NANs so sc.pp ... - GitHub

After running sc.pp.combat(adata, key='sample'), adata.X is full ... X with NANs so sc.pp.highly_variable_genes function outputs error #1172.

Combat function causing downstream error - Help - Scanpy

After running combat function for batch correct, the highly variable genes function throws an error sc.pp.combat(adata) sc.pp.highly_variable_genes(adata, ...

scanpy.pp.highly_variable_genes - Read the Docs

First, the data are standardized (i.e., z-score normalization per feature) with a regularized standard deviation. Next, the normalized variance is computed as ...

scVI Documentation

scVI is a package for end-to-end analysis of single-cell omics data. The package is composed of several deep genera-.

Proceedings - JOBIM 2022

Discovery of potential functional paths by integration of phospho-proteomics data in the PPI network using a RWR framework (Proceedings) [Jeremie PERRIN, ...