question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Add `u_based_decision` optionality to `sklearn.decomposition.PCA`

See original GitHub issue

scikit-learn’s PCA uses scipy.linalg.svd to compute the singular value decomposition of X.

Because the solution is unique up to a change in sign in pairs of left and right singular vectors, the .fit method then uses svd_flip to enforce deterministic output.

This consists of

Adjust[ing] the columns of u and the rows of v such that the loadings in the
columns in u that are largest in absolute value are always positive.

And svd_flip comes with a parameter u_based_decision (default True) that allows for flipping this logic if the parameter is False. However, this parameter is not present in PCA.fit() or any of its similar methods.

Why not add this parameter to PCA’s methods and then pass it to svd_flip? This will affect a critical attribute, components_, which is simply equal to V. Otherwise, the only real way to use u_based_decision=False within PCA itself is to rewrite from the ground up with a lot of code copy/pasting involved.

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:8 (5 by maintainers)

github_iconTop GitHub Comments

3reactions
lestevecommented, Sep 14, 2017

And by the way @bsolomon1124 thanks for the good quality issue! It is always nice when we can see that the OP has made an effort to research the issue and explain where he was coming from. Let’s say this does not always happen …

0reactions
lestevecommented, Sep 14, 2017

I am going to try to sum up the discussion here:

  • sign of the coefficients in a given eigenvector does not mean anything at all and for reproducibility of what the solver does we have u_based parameter in svd_flip.
  • nonetheless you are trying to interpret the sign of the loadings in some fashion, and you are complaining that the result does not match your intuition
  • in your particular example being able to pass u_base=False to PCA would match what you expectations

The thing is that you seem to be in a very particular case where all of the elements of your loading have the same sign. Imagine a financial instrument negatively correlated your “market-like” component, with a high absolute value, I would bet that neither u_based=True nor u_based=False would still match your expectation at all.

If you have a good reason to interpret the sign of the loadings in your particular use case I would suggest you to do some post-processing of pca.components_ and implement the rule you are thinking of. Maybe you can inherit from PCA and do not have to rewrite that much code, maybe it’s not that convenient, I am not sure.

I am going to close this one, @bsolomon1124 feel free to shout if you strongly disagree.

Read more comments on GitHub >

github_iconTop Results From Across the Web

sklearn.decomposition.PCA — scikit-learn 1.2.0 documentation
Principal component analysis (PCA). Linear dimensionality reduction using Singular Value Decomposition of the data to project it to a lower dimensional space.
Read more >
2.5. Decomposing signals in components (matrix factorization ...
PCA is used to decompose a multivariate dataset in a set of successive orthogonal components that explain a maximum amount of the variance....
Read more >
sklearn.decomposition.SparsePCA
Sparse Principal Components Analysis (SparsePCA). Finds the set of sparse components that can optimally reconstruct the data. The amount of sparseness is ...
Read more >
sklearn.decomposition.KernelPCA
sklearn.decomposition .KernelPCA¶ ... Kernel Principal component analysis (KPCA) [1]. Non-linear dimensionality reduction through the use of kernels (see Pairwise ...
Read more >
sklearn.decomposition.RandomizedPCA
Principal component analysis (PCA) using randomized SVD. Linear dimensionality reduction using approximated Singular Value Decomposition of the data and ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found