Add n_components_ to SparsePCA
See original GitHub issuePCA
allows to retrieve the number of components with n_components_
attribute; this is, however, not possible with SparsePCA
(both PCA and SparsePCA accept n_components
argument).
Would it make sense to enable accessing n_components_
on PCA too? Please, note that this would be different from n_components
, which is already available, but represents an unprocessed input argument, i.e. None if nothing was passed).
The current PCA behaviour:
from sklearn.decomposition import PCA, SparsePCA
from sklearn import datasets
iris = datasets.load_iris()
pca = PCA()
pca.fit(iris.data)
assert pca.n_components_ == 4
assert pca.n_components == None
assert len(pca.components_) == 4
pca_3 = PCA(n_components=3)
pca_3.fit(iris.data)
assert pca_3.n_components_ == 3
assert pca_3.n_components == 3
assert len(pca_3.components_) == 3
Existing SparsePCA behaviour:
spca = SparsePCA()
spca.fit(iris.data)
assert spca.n_components == None
assert len(spca.components_) == 4
spca_3 = SparsePCA(n_components=3)
spca_3.fit(iris.data)
assert spca_3.n_components == 3
assert len(spca_3.components_) == 3
Proposed SparsePCA behaviour:
assert spca.n_components_ == 4
assert spca_3.n_components_ == 3
This could also be added to KernelPCA
and other PCA methods. Implementation-wise the code for calculating the number of components PCA
could be generalised (this is replacing None
with the actual number and/or trimming by the number of features or samples; I think that it might be placed _BasePCA
, but actually neither SparsePCA nor KernelPCA descends from it). Is this the right direction?
On a related note, would make sense to have a computed property name n_non_trivial_components_
to give the number of components which have non-zero loadings?
Edit: a simple workaround is to use len(spca.components_)
, which works equally well for sparse and dense PCA - I am not sure of the addition of n_components_
is needed, but the point is that it would be great to have a consistent interface for all PCA methods!
Issue Analytics
- State:
- Created 3 years ago
- Comments:5 (4 by maintainers)
Top GitHub Comments
Thanks for the detailed proposition. Adding an attribute
n_components_
seems reasonable. Implementation withlen(spca.components_)
seems straightforward. An ideal pull-request would have a small test, an entry indoc/whats_new/v0.23.rst
, and an update of the docstring.Do you want to open a pull-request ?
fixed in #16981