Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

PCA, LDA, unexpected explained_variance_ratio

See original GitHub issue

from sklearn import datasets
from sklearn.decomposition import PCA
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis

iris = datasets.load_iris()

X = iris.data
y = iris.target
target_names = iris.target_names

#### dimensionality reduction using PCA
pca = PCA(n_components=2)
X_r = pca.fit(X).transform(X)

#### Percentage of variance explained for each components
print('PCA: explained variance ratio (first two components): %s'
      % str(pca.explained_variance_ratio_))

#### dimensionality reduction using LDA
lda = LinearDiscriminantAnalysis(n_components=2)
X_r2 = lda.fit(X, y).transform(X)

print('LDA: explained variance ratio (first two components): %s'
      % str(lda.explained_variance_ratio_))

Expected Results

The first componet of the PCA has a larger variance ratio than that from the first componet from LDA.

Actual Results

PCA: explained variance ratio (first two components): [ 0.925 0.053] LDA: explained variance ratio (first two components): [ 0.991 0.009]

Versions

Darwin-14.5.0-x86_64-i386-64bit Python 3.5.3 |Anaconda custom (x86_64)| (default, Mar 6 2017, 12:15:08) [GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)] NumPy 1.12.1 SciPy 0.19.1 Scikit-Learn 0.18.2

Issue Analytics

State:
Created 6 years ago
Comments:6 (3 by maintainers)

Top GitHub Comments

1reaction

JPFrancoiacommented, May 22, 2018

Euh, I’m really not sure explained_variance_ratio should be the same for PCA and LDA.

PCA is unsupervised, LDA is supervised. The principal components are calculated differently since LDA needs a label (y) for each point (that’s why lda.fit(X, y).transform(X) and pca.fit(X).transform(X)).

Since LDA will find different principal components, I see no reason why explained_variance_ratio should be the same in both cases.

If I’m wrong please let me know, otherwise this issue should be closed because it’s not a bug.

0reactions

YangXiaozhoucommented, Mar 7, 2021

Euh, I’m really not sure explained_variance_ratio should be the same for PCA and LDA.

PCA is unsupervised, LDA is supervised. The principal components are calculated differently since LDA needs a label (y) for each point (that’s why lda.fit(X, y).transform(X) and pca.fit(X).transform(X)).

Since LDA will find different principal components, I see no reason why explained_variance_ratio should be the same in both cases.

If I’m wrong please let me know, otherwise this issue should be closed because it’s not a bug.

Adding on to @JPFrancoia 's comment, there is indeed no reason why the two ratios should be the same since they are computed based on different data. Therefore it might not be a bug.

In particular, while both are ratios of descending eigenvalues against the sum of the eigenvalues, PCA computes eigenvalues from the entire data set, X, while LDA computes those from W^{-1}B, the multiplication of within-class variance and between-class variance calculated from X and y. For example, when X is fixed, PCA would produce the same eigenvectors and eigenvalues every time, thus the same ratio of variance explained. However, LDA would give different ratios (because of different projection directions found) for different sets of y even when X is the same.

Top Results From Across the Web

PCA: 91% of explained variance on one principal component

How do I interpret 91% of explained variance on one component? You interpret it with a very high degree of correlation between the...

PCA Explained Variance Concepts with Python Example

Explained variance is a statistical measure of how much variation in a dataset can be attributed to each of the principal components ( ......

Principal Component Analysis for Dimensionality Reduction in ...

Perhaps the most popular technique for dimensionality reduction in machine learning is Principal Component Analysis, or PCA for short.

A Tutorial on Principal Component Analysis

Principal component analysis (PCA) is a mainstay of modern data analysis - a black ... ratio (SNR), or a ratio of variances σ2,...

assigmnent 2 Machine lauguage PCA LDA.pdf - Question 1

View assigmnent 2 Machine lauguage PCA LDA.pdf from MSCFE 650 at WorldQuant University. Question 1: Explain the meaning of explained variance, i.e. explain...