question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

scanpy.pp.pca fails on small datasets

See original GitHub issue

scanpy.pp.pca fails if n_samples < 50 < n_features

import numpy as np
import scanpy as sc
import anndata

adata = anndata.AnnData(np.random.normal(0, 1, (40, 100)))
sc.pp.pca(adata)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/scottgigante/.local/lib/python3.8/site-packages/scanpy/preprocessing/_simple.py", line 531, in pca
    X_pca = pca_.fit_transform(X)
  File "/usr/lib/python3.8/site-packages/sklearn/decomposition/_pca.py", line 369, in fit_transform
    U, S, V = self._fit(X)
  File "/usr/lib/python3.8/site-packages/sklearn/decomposition/_pca.py", line 418, in _fit
    return self._fit_truncated(X, n_components, self._fit_svd_solver)
  File "/usr/lib/python3.8/site-packages/sklearn/decomposition/_pca.py", line 497, in _fit_truncated
    raise ValueError("n_components=%r must be between 1 and "
ValueError: n_components=50 must be between 1 and min(n_samples, n_features)=40 with svd_solver='arpack'

Versions:

scanpy==1.2.3.dev1409+g7ca201d.d20200112 anndata==0.6.22.post1 umap==0.3.10 numpy==1.18.0 scipy==1.4.1 pandas==0.25.3 scikit-learn==0.22 statsmodels==0.11.0rc1 python-igraph==0.7.1 louvain==0.6.1

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:8 (7 by maintainers)

github_iconTop GitHub Comments

2reactions
ivirshupcommented, Feb 16, 2020

Ha, actually we implement filtering for highly_variable_genes as taking a subset of the anndata object, so it is min(adata.n_vars, adata.n_obs)

1reaction
ivirshupcommented, Feb 16, 2020

Thanks for the catch! It should be min_dim = min(*X.shape) where X is the selected representation.

Read more comments on GitHub >

github_iconTop Results From Across the Web

scanpy.pp.pca fails on small datasets · Issue #1051 - GitHub
scanpy.pp.pca fails if n_samples < 50 < n_features import numpy as np import scanpy as sc import anndata adata = anndata.
Read more >
scanpy.pp.pca — Scanpy 1.9.1 documentation - Read the Docs
scanpy.pp.pca ... Principal component analysis [Pedregosa11]. Computes PCA coordinates, loadings and variance decomposition. Uses the implementation of scikit- ...
Read more >
Importing python modules and loading data
PCA can be performed by simply calling scanpy's sc.pp.pca(). This call will perform the PCA but will not return anything. We can access...
Read more >
new-10kPBMC-Scanpy
Scanpy tutorial using 10k PBMCs dataset¶. This notebook should introduce you to some typical tasks, using Scanpy eco-system. Scanpy notebooks and tutorials are ......
Read more >
Analysis with SCANPY for Nestorova16 - Kaggle
/kaggle/input/scanpy-python-package-for-scrnaseq-analysis/SCANPY ... inch) yields small inline figures sc.pp.neighbors(adata, n_neighbors=4, ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found