Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Add n_components_ to SparsePCA

See original GitHub issue

PCA allows to retrieve the number of components with n_components_ attribute; this is, however, not possible with SparsePCA (both PCA and SparsePCA accept n_components argument).

Would it make sense to enable accessing n_components_ on PCA too? Please, note that this would be different from n_components, which is already available, but represents an unprocessed input argument, i.e. None if nothing was passed).

The current PCA behaviour:

from sklearn.decomposition import PCA, SparsePCA
from sklearn import datasets

iris = datasets.load_iris()
pca = PCA()
pca.fit(iris.data)
assert pca.n_components_ == 4
assert pca.n_components == None
assert len(pca.components_) == 4

pca_3 = PCA(n_components=3)
pca_3.fit(iris.data)
assert pca_3.n_components_ == 3
assert pca_3.n_components == 3
assert len(pca_3.components_) == 3

Existing SparsePCA behaviour:

spca = SparsePCA()
spca.fit(iris.data)
assert spca.n_components == None
assert len(spca.components_) == 4

spca_3 = SparsePCA(n_components=3)
spca_3.fit(iris.data)
assert spca_3.n_components == 3
assert len(spca_3.components_) == 3

Proposed SparsePCA behaviour:

assert spca.n_components_ == 4
assert spca_3.n_components_ == 3

This could also be added to KernelPCA and other PCA methods. Implementation-wise the code for calculating the number of components PCA could be generalised (this is replacing None with the actual number and/or trimming by the number of features or samples; I think that it might be placed _BasePCA, but actually neither SparsePCA nor KernelPCA descends from it). Is this the right direction?

On a related note, would make sense to have a computed property name n_non_trivial_components_ to give the number of components which have non-zero loadings?

Edit: a simple workaround is to use len(spca.components_), which works equally well for sparse and dense PCA - I am not sure of the addition of n_components_ is needed, but the point is that it would be great to have a consistent interface for all PCA methods!

Issue Analytics

State:
Created 3 years ago
Comments:5 (4 by maintainers)

Top GitHub Comments

1reaction

TomDLTcommented, Mar 25, 2020

Thanks for the detailed proposition. Adding an attribute n_components_ seems reasonable. Implementation with len(spca.components_) seems straightforward. An ideal pull-request would have a small test, an entry in doc/whats_new/v0.23.rst, and an update of the docstring.

Do you want to open a pull-request ?

0reactions

TomDLTcommented, Apr 27, 2020

fixed in #16981

Top Results From Across the Web

A Guide for Sparse PCA: Model Comparison and Applications

PCA is mainly used to summarize the individual variables' scores by a few derived components based on a linear combination of the individual...

A Guide for Sparse PCA: Model Comparison and Applications

PCA is mainly used to summarize the individual variables' scores by a few derived components based on a linear combination of the individual ......

Selecting the number of sparse principal components to ...

Sparse PC is for example L1 (lasso)-penalized PCA. In ordinary PCA we can usually enter terms in order of variation explained. With sparse...

sklearn.decomposition.SparsePCA

Sparse Principal Components Analysis (SparsePCA). Finds the set of sparse components that can optimally reconstruct the data. The amount of sparseness is ...

Solving Large-Scale Sparse PCA to Certifiable (Near) Optimality

Abstract: Sparse principal component analysis (PCA) is a popular dimensionality reduction technique for obtaining principal components which ...