Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

scanpy.pp.pca fails on small datasets

See original GitHub issue

scanpy.pp.pca fails if n_samples < 50 < n_features

import numpy as np
import scanpy as sc
import anndata

adata = anndata.AnnData(np.random.normal(0, 1, (40, 100)))
sc.pp.pca(adata)

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/scottgigante/.local/lib/python3.8/site-packages/scanpy/preprocessing/_simple.py", line 531, in pca
    X_pca = pca_.fit_transform(X)
  File "/usr/lib/python3.8/site-packages/sklearn/decomposition/_pca.py", line 369, in fit_transform
    U, S, V = self._fit(X)
  File "/usr/lib/python3.8/site-packages/sklearn/decomposition/_pca.py", line 418, in _fit
    return self._fit_truncated(X, n_components, self._fit_svd_solver)
  File "/usr/lib/python3.8/site-packages/sklearn/decomposition/_pca.py", line 497, in _fit_truncated
    raise ValueError("n_components=%r must be between 1 and "
ValueError: n_components=50 must be between 1 and min(n_samples, n_features)=40 with svd_solver='arpack'

Versions:

scanpy==1.2.3.dev1409+g7ca201d.d20200112 anndata==0.6.22.post1 umap==0.3.10 numpy==1.18.0 scipy==1.4.1 pandas==0.25.3 scikit-learn==0.22 statsmodels==0.11.0rc1 python-igraph==0.7.1 louvain==0.6.1

Issue Analytics

State:
Created 4 years ago
Comments:8 (7 by maintainers)

Top GitHub Comments

2reactions

ivirshupcommented, Feb 16, 2020

Ha, actually we implement filtering for highly_variable_genes as taking a subset of the anndata object, so it is min(adata.n_vars, adata.n_obs)

1reaction

ivirshupcommented, Feb 16, 2020

Thanks for the catch! It should be min_dim = min(*X.shape) where X is the selected representation.

Top Results From Across the Web

scanpy.pp.pca fails on small datasets · Issue #1051 - GitHub

scanpy.pp.pca fails if n_samples < 50 < n_features import numpy as np import scanpy as sc import anndata adata = anndata.

scanpy.pp.pca — Scanpy 1.9.1 documentation - Read the Docs

scanpy.pp.pca ... Principal component analysis [Pedregosa11]. Computes PCA coordinates, loadings and variance decomposition. Uses the implementation of scikit- ...

Importing python modules and loading data

PCA can be performed by simply calling scanpy's sc.pp.pca(). This call will perform the PCA but will not return anything. We can access...

new-10kPBMC-Scanpy

Scanpy tutorial using 10k PBMCs dataset¶. This notebook should introduce you to some typical tasks, using Scanpy eco-system. Scanpy notebooks and tutorials are ......

Analysis with SCANPY for Nestorova16 - Kaggle

/kaggle/input/scanpy-python-package-for-scrnaseq-analysis/SCANPY ... inch) yields small inline figures sc.pp.neighbors(adata, n_neighbors=4, ...