Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Add `u_based_decision` optionality to `sklearn.decomposition.PCA`

See original GitHub issue

scikit-learn’s PCA uses scipy.linalg.svd to compute the singular value decomposition of X.

Because the solution is unique up to a change in sign in pairs of left and right singular vectors, the .fit method then uses svd_flip to enforce deterministic output.

This consists of

Adjust[ing] the columns of u and the rows of v such that the loadings in the
columns in u that are largest in absolute value are always positive.

And svd_flip comes with a parameter u_based_decision (default True) that allows for flipping this logic if the parameter is False. However, this parameter is not present in PCA.fit() or any of its similar methods.

Why not add this parameter to PCA’s methods and then pass it to svd_flip? This will affect a critical attribute, components_, which is simply equal to V. Otherwise, the only real way to use u_based_decision=False within PCA itself is to rewrite from the ground up with a lot of code copy/pasting involved.

Issue Analytics

State:
Created 6 years ago
Comments:8 (5 by maintainers)

Top GitHub Comments

3reactions

lestevecommented, Sep 14, 2017

And by the way @bsolomon1124 thanks for the good quality issue! It is always nice when we can see that the OP has made an effort to research the issue and explain where he was coming from. Let’s say this does not always happen …

0reactions

lestevecommented, Sep 14, 2017

I am going to try to sum up the discussion here:

sign of the coefficients in a given eigenvector does not mean anything at all and for reproducibility of what the solver does we have u_based parameter in svd_flip.
nonetheless you are trying to interpret the sign of the loadings in some fashion, and you are complaining that the result does not match your intuition
in your particular example being able to pass u_base=False to PCA would match what you expectations

The thing is that you seem to be in a very particular case where all of the elements of your loading have the same sign. Imagine a financial instrument negatively correlated your “market-like” component, with a high absolute value, I would bet that neither u_based=True nor u_based=False would still match your expectation at all.

If you have a good reason to interpret the sign of the loadings in your particular use case I would suggest you to do some post-processing of pca.components_ and implement the rule you are thinking of. Maybe you can inherit from PCA and do not have to rewrite that much code, maybe it’s not that convenient, I am not sure.

I am going to close this one, @bsolomon1124 feel free to shout if you strongly disagree.

Top Results From Across the Web

sklearn.decomposition.PCA — scikit-learn 1.2.0 documentation

Principal component analysis (PCA). Linear dimensionality reduction using Singular Value Decomposition of the data to project it to a lower dimensional space.

2.5. Decomposing signals in components (matrix factorization ...

PCA is used to decompose a multivariate dataset in a set of successive orthogonal components that explain a maximum amount of the variance....

sklearn.decomposition.SparsePCA

Sparse Principal Components Analysis (SparsePCA). Finds the set of sparse components that can optimally reconstruct the data. The amount of sparseness is ...

sklearn.decomposition.KernelPCA

sklearn.decomposition .KernelPCA¶ ... Kernel Principal component analysis (KPCA) [1]. Non-linear dimensionality reduction through the use of kernels (see Pairwise ...

sklearn.decomposition.RandomizedPCA

Principal component analysis (PCA) using randomized SVD. Linear dimensionality reduction using approximated Singular Value Decomposition of the data and ...