question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

PCA, LDA, unexpected explained_variance_ratio

See original GitHub issue
from sklearn import datasets
from sklearn.decomposition import PCA
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis

iris = datasets.load_iris()

X = iris.data
y = iris.target
target_names = iris.target_names

#### dimensionality reduction using PCA
pca = PCA(n_components=2)
X_r = pca.fit(X).transform(X)

#### Percentage of variance explained for each components
print('PCA: explained variance ratio (first two components): %s'
      % str(pca.explained_variance_ratio_))

#### dimensionality reduction using LDA
lda = LinearDiscriminantAnalysis(n_components=2)
X_r2 = lda.fit(X, y).transform(X)

print('LDA: explained variance ratio (first two components): %s'
      % str(lda.explained_variance_ratio_))

Expected Results

The first componet of the PCA has a larger variance ratio than that from the first componet from LDA.

Actual Results

PCA: explained variance ratio (first two components): [ 0.925 0.053] LDA: explained variance ratio (first two components): [ 0.991 0.009]

Versions

Darwin-14.5.0-x86_64-i386-64bit Python 3.5.3 |Anaconda custom (x86_64)| (default, Mar 6 2017, 12:15:08) [GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)] NumPy 1.12.1 SciPy 0.19.1 Scikit-Learn 0.18.2

Issue Analytics

  • State:open
  • Created 6 years ago
  • Comments:6 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
JPFrancoiacommented, May 22, 2018

Euh, I’m really not sure explained_variance_ratio should be the same for PCA and LDA.

PCA is unsupervised, LDA is supervised. The principal components are calculated differently since LDA needs a label (y) for each point (that’s why lda.fit(X, y).transform(X) and pca.fit(X).transform(X)).

Since LDA will find different principal components, I see no reason why explained_variance_ratio should be the same in both cases.

If I’m wrong please let me know, otherwise this issue should be closed because it’s not a bug.

0reactions
YangXiaozhoucommented, Mar 7, 2021

Euh, I’m really not sure explained_variance_ratio should be the same for PCA and LDA.

PCA is unsupervised, LDA is supervised. The principal components are calculated differently since LDA needs a label (y) for each point (that’s why lda.fit(X, y).transform(X) and pca.fit(X).transform(X)).

Since LDA will find different principal components, I see no reason why explained_variance_ratio should be the same in both cases.

If I’m wrong please let me know, otherwise this issue should be closed because it’s not a bug.

Adding on to @JPFrancoia 's comment, there is indeed no reason why the two ratios should be the same since they are computed based on different data. Therefore it might not be a bug.

In particular, while both are ratios of descending eigenvalues against the sum of the eigenvalues, PCA computes eigenvalues from the entire data set, X, while LDA computes those from W^{-1}B, the multiplication of within-class variance and between-class variance calculated from X and y. For example, when X is fixed, PCA would produce the same eigenvectors and eigenvalues every time, thus the same ratio of variance explained. However, LDA would give different ratios (because of different projection directions found) for different sets of y even when X is the same.

Read more comments on GitHub >

github_iconTop Results From Across the Web

PCA: 91% of explained variance on one principal component
How do I interpret 91% of explained variance on one component? You interpret it with a very high degree of correlation between the...
Read more >
PCA Explained Variance Concepts with Python Example
Explained variance is a statistical measure of how much variation in a dataset can be attributed to each of the principal components ( ......
Read more >
Principal Component Analysis for Dimensionality Reduction in ...
Perhaps the most popular technique for dimensionality reduction in machine learning is Principal Component Analysis, or PCA for short.
Read more >
A Tutorial on Principal Component Analysis
Principal component analysis (PCA) is a mainstay of modern data analysis - a black ... ratio (SNR), or a ratio of variances σ2,...
Read more >
assigmnent 2 Machine lauguage PCA LDA.pdf - Question 1
View assigmnent 2 Machine lauguage PCA LDA.pdf from MSCFE 650 at WorldQuant University. Question 1: Explain the meaning of explained variance, i.e. explain...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found