Combat populates adata.X with NANs so sc.pp.highly_variable_genes function outputs error
See original GitHub issueAfter running sc.pp.combat(adata, key='sample')
, adata.X is full of NaNs and sc.pp.highly_variagle_genes(adata)
fails. No issues (and no NaNs) if NOT running the combat correction.
sc.pp.combat(adata, key='sample')
sc.pp.highly_variable_genes(adata)
In [1]: sc.pp.combat(adata, key='sample')
/home/auesro/anaconda3/envs/Scanpy/lib/python3.7/site-packages/anndata/_core/anndata.py:21: FutureWarning: pandas.core.index is deprecated and will be removed in a future version. The public classes are available in the top-level namespace.
from pandas.core.index import RangeIndex
/home/auesro/anaconda3/envs/Scanpy/lib/python3.7/site-packages/sklearn/externals/six.py:31: FutureWarning: The module is deprecated in version 0.21 and will be removed in version 0.23 since we've dropped support for Python 2.7. Please rely on the official version of six (https://pypi.org/project/six/).
"(https://pypi.org/project/six/).", FutureWarning)
scanpy==1.4.6 anndata==0.7.1 umap==0.4.1 numpy==1.18.1 scipy==1.4.1 pandas==1.0.3 scikit-learn==0.22.2.post1 statsmodels==0.11.1 python-igraph==0.8.0
Standardizing Data across genes.
Found 11 batches
Found 0 numerical variables:
Found 3 genes with zero variance.
Fitting L/S model and finding priors
Finding parametric adjustments
Adjusting data
/home/auesro/anaconda3/envs/Scanpy/lib/python3.7/site-packages/scanpy/preprocessing/_combat.py:338: RuntimeWarning: invalid value encountered in true_divide
change = max((abs(g_new - g_old) / g_old).max(), (abs(d_new - d_old) / d_old).max())
/home/auesro/anaconda3/envs/Scanpy/lib/python3.7/site-packages/scanpy/preprocessing/_combat.py:338: RuntimeWarning: divide by zero encountered in true_divide
change = max((abs(g_new - g_old) / g_old).max(), (abs(d_new - d_old) / d_old).max())
In [2]: sc.pp.highly_variable_genes(adata)
extracting highly variable genes
Traceback (most recent call last):
File "<ipython-input-2-7727f5f928cd>", line 1, in <module>
sc.pp.highly_variable_genes(adata)
File "/home/auesro/anaconda3/envs/Scanpy/lib/python3.7/site-packages/scanpy/preprocessing/_highly_variable_genes.py", line 235, in highly_variable_genes
flavor=flavor,
File "/home/auesro/anaconda3/envs/Scanpy/lib/python3.7/site-packages/scanpy/preprocessing/_highly_variable_genes.py", line 65, in _highly_variable_genes_single_batch
df['mean_bin'] = pd.cut(df['means'], bins=n_bins)
File "/home/auesro/anaconda3/envs/Scanpy/lib/python3.7/site-packages/pandas/core/reshape/tile.py", line 265, in cut
duplicates=duplicates,
File "/home/auesro/anaconda3/envs/Scanpy/lib/python3.7/site-packages/pandas/core/reshape/tile.py", line 381, in _bins_to_cuts
f"Bin edges must be unique: {repr(bins)}.\n"
ValueError: Bin edges must be unique: array([nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan, nan, nan, nan]).
You can drop duplicate edges by setting the 'duplicates' kwarg
Versions:
scanpy==1.4.6 anndata==0.7.1 umap==0.4.1 numpy==1.18.1 scipy==1.4.1 pandas==1.0.3 scikit-learn==0.22.2.post1 statsmodels==0.11.1 python-igraph==0.8.0
Issue Analytics
- State:
- Created 3 years ago
- Comments:13 (5 by maintainers)
Top Results From Across the Web
Combat populates adata.X with NANs so sc.pp ... - GitHub
After running sc.pp.combat(adata, key='sample'), adata.X is full ... X with NANs so sc.pp.highly_variable_genes function outputs error #1172.
Read more >Combat function causing downstream error - Help - Scanpy
After running combat function for batch correct, the highly variable genes function throws an error sc.pp.combat(adata) sc.pp.highly_variable_genes(adata, ...
Read more >scanpy.pp.highly_variable_genes - Read the Docs
First, the data are standardized (i.e., z-score normalization per feature) with a regularized standard deviation. Next, the normalized variance is computed as ...
Read more >scVI Documentation
scVI is a package for end-to-end analysis of single-cell omics data. The package is composed of several deep genera-.
Read more >Proceedings - JOBIM 2022
Discovery of potential functional paths by integration of phospho-proteomics data in the PPI network using a RWR framework (Proceedings) [Jeremie PERRIN, ...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
I think it’s a nice corner case we should handle. Can you file another bug about having 1cell batches in combat or highly_variable_genes with batch_key option?
@LuckyMD of course your are right… @gokceneraslan sure, I can, it would be nice just to get a warning or have Combat halt the processing on first checking that there is a batch with just 1 cell.