Add `u_based_decision` optionality to `sklearn.decomposition.PCA`
See original GitHub issuescikit-learn’s PCA
uses scipy.linalg.svd
to compute the singular value decomposition of X
.
Because the solution is unique up to a change in sign in pairs of left and right singular vectors, the .fit
method then uses svd_flip
to enforce deterministic output.
This consists of
Adjust[ing] the columns of u and the rows of v such that the loadings in the columns in u that are largest in absolute value are always positive.
And svd_flip
comes with a parameter u_based_decision
(default True
) that allows for flipping this logic if the parameter is False
. However, this parameter is not present in PCA.fit()
or any of its similar methods.
Why not add this parameter to PCA
’s methods and then pass it to svd_flip
? This will affect a critical attribute, components_
, which is simply equal to V
. Otherwise, the only real way to use u_based_decision=False
within PCA
itself is to rewrite from the ground up with a lot of code copy/pasting involved.
Issue Analytics
- State:
- Created 6 years ago
- Comments:8 (5 by maintainers)
Top GitHub Comments
And by the way @bsolomon1124 thanks for the good quality issue! It is always nice when we can see that the OP has made an effort to research the issue and explain where he was coming from. Let’s say this does not always happen …
I am going to try to sum up the discussion here:
u_based
parameter insvd_flip
.u_base=False
to PCA would match what you expectationsThe thing is that you seem to be in a very particular case where all of the elements of your loading have the same sign. Imagine a financial instrument negatively correlated your “market-like” component, with a high absolute value, I would bet that neither
u_based=True
noru_based=False
would still match your expectation at all.If you have a good reason to interpret the sign of the loadings in your particular use case I would suggest you to do some post-processing of
pca.components_
and implement the rule you are thinking of. Maybe you can inherit fromPCA
and do not have to rewrite that much code, maybe it’s not that convenient, I am not sure.I am going to close this one, @bsolomon1124 feel free to shout if you strongly disagree.