scanpy.pp.pca fails on small datasets
See original GitHub issuescanpy.pp.pca fails if n_samples < 50 < n_features
import numpy as np
import scanpy as sc
import anndata
adata = anndata.AnnData(np.random.normal(0, 1, (40, 100)))
sc.pp.pca(adata)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/scottgigante/.local/lib/python3.8/site-packages/scanpy/preprocessing/_simple.py", line 531, in pca
X_pca = pca_.fit_transform(X)
File "/usr/lib/python3.8/site-packages/sklearn/decomposition/_pca.py", line 369, in fit_transform
U, S, V = self._fit(X)
File "/usr/lib/python3.8/site-packages/sklearn/decomposition/_pca.py", line 418, in _fit
return self._fit_truncated(X, n_components, self._fit_svd_solver)
File "/usr/lib/python3.8/site-packages/sklearn/decomposition/_pca.py", line 497, in _fit_truncated
raise ValueError("n_components=%r must be between 1 and "
ValueError: n_components=50 must be between 1 and min(n_samples, n_features)=40 with svd_solver='arpack'
Versions:
scanpy==1.2.3.dev1409+g7ca201d.d20200112 anndata==0.6.22.post1 umap==0.3.10 numpy==1.18.0 scipy==1.4.1 pandas==0.25.3 scikit-learn==0.22 statsmodels==0.11.0rc1 python-igraph==0.7.1 louvain==0.6.1
Issue Analytics
- State:
- Created 4 years ago
- Comments:8 (7 by maintainers)
Top Results From Across the Web
scanpy.pp.pca fails on small datasets · Issue #1051 - GitHub
scanpy.pp.pca fails if n_samples < 50 < n_features import numpy as np import scanpy as sc import anndata adata = anndata.
Read more >scanpy.pp.pca — Scanpy 1.9.1 documentation - Read the Docs
scanpy.pp.pca ... Principal component analysis [Pedregosa11]. Computes PCA coordinates, loadings and variance decomposition. Uses the implementation of scikit- ...
Read more >Importing python modules and loading data
PCA can be performed by simply calling scanpy's sc.pp.pca(). This call will perform the PCA but will not return anything. We can access...
Read more >new-10kPBMC-Scanpy
Scanpy tutorial using 10k PBMCs dataset¶. This notebook should introduce you to some typical tasks, using Scanpy eco-system. Scanpy notebooks and tutorials are ......
Read more >Analysis with SCANPY for Nestorova16 - Kaggle
/kaggle/input/scanpy-python-package-for-scrnaseq-analysis/SCANPY ... inch) yields small inline figures sc.pp.neighbors(adata, n_neighbors=4, ...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Ha, actually we implement filtering for
highly_variable_genes
as taking a subset of the anndata object, so it ismin(adata.n_vars, adata.n_obs)
Thanks for the catch! It should be
min_dim = min(*X.shape)
whereX
is the selected representation.