question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Spectral Clustering Algorithm documentation clarification

See original GitHub issue

Description

The current implementation of Spectral Clustering does not correctly implement the algorithms presented in the referenced papers (references again added below for convenience). The implementation is supposed to produce an approximation to the normalized cut using the algorithm presented in (Shi and Malik 2000).

The current implementation takes the first k eigenvectors of the normalized laplacian matrix D^-1/2 L D^-1/2 as returned by spectral_embedding (https://github.com/scikit-learn/scikit-learn/blob/a24c8b464d094d2c468a16ea9f8bf8d42d949f84/sklearn/cluster/spectral.py#L259) and then uses k-means on the embedding to obtain a clustering. However, as can be seen for example in section 5.3 of (von Luxburg 2007), this is not the solution to the relaxed Ncut problem, but instead is the matrix that von Luxburg denotes by T. So while the current implementation uses the matrix T, the actual solution to the relaxed Ncut problem is obtained by using H=D^-1/2 T.

Possible solutions to making the algorithm true to the referenced papers:

  • replace the current matrix maps with D^-1/2 maps before calling kmeans, where D is the degree matrix (https://github.com/scikit-learn/scikit-learn/blob/a24c8b464d094d2c468a16ea9f8bf8d42d949f84/sklearn/cluster/spectral.py#L259) but this is inefficient
  • instead of obtaining a spectral embedding with the symmetric normalized laplacian, use the random walk normalized laplacian.
  • obtain the generalized eigenvectors u with eigenvalues e, by computing Au=eBu and set A=L to the unnormalized laplacian matrix and B=D the degree matrix.
  • normalize the rows of maps to 1, which would make this the spectral clustering algorithm of Ng, Jordan, and Weiss (2002)
  • acknowledge the difference between implementation and papers in the comments if someone conducted experiments showing that this difference does not affect the quality of the obtained solutions.

References

Normalized cuts and image segmentation, 2000 Jianbo Shi, Jitendra Malik http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.160.2324 A Tutorial on Spectral Clustering, 2007 Ulrike von Luxburg http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.165.9323 Multiclass spectral clustering, 2003 Stella X. Yu, Jianbo Shi http://www1.icsi.berkeley.edu/~stellayu/publication/doc/2003kwayICCV.pdf

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:18 (16 by maintainers)

github_iconTop GitHub Comments

1reaction
lobpcgcommented, Jul 24, 2019

Please see my answers below:

According to the source code, csgraph_laplacian with normed=True returns the symmetric normalized laplacian L_sym = D^-1/2 L D^-1/2. In spectral_embedding do we then solve L_sym u = lambda Du ? Or L_sym u = lambda u ?

It’s my case (II.2.b), so L_sym u = lambda u

It looks to me like we solve a generalized eigenvalue problem since we call lobpcg(laplacian, X, ... where X contains D.

No, X does not contain D. For generalized solve with lobpcg you need lobpcg(laplacian, X, D, ... see https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.linalg.lobpcg.html

edit: btw I understand we can use unnormalized L and Lx = lambda x to obtain an approximation to RatioCut and we can use a normalized laplacian to approximate Ncut (in different ways depending on how we normalize and how we set up the eigenvalue problem).

Yes. In fact one can choose D independently of L, and D does not even have to be diagonal - that will still produce some good clustering solving Lu = lambda Du for some D, see https://sigport.org/documents/models-spectral-clustering-and-their-applications that tries to analyze how changing D affects the spectral clustering.

0reactions
SonicStarkcommented, Aug 26, 2022

@DanBenHa

From an algebraic point of view, the actual implementation goes like this

References A Tutorial on Spectral Clustering, 2007 Ulrike von Luxburg http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.165.9323

Read more comments on GitHub >

github_iconTop Results From Across the Web

A Tutorial on Spectral Clustering - People
We describe different graph Laplacians and their basic properties, present the most common spectral clustering algorithms, and derive those algorithms from ...
Read more >
sklearn.cluster.SpectralClustering
In practice Spectral Clustering is very useful when the structure of the individual clusters is highly non-convex, or more generally when a measure...
Read more >
A Tutorial on Spectral Clustering
It is simple to implement, can be solved efficiently by standard linear algebra software, and very often outperforms traditional clustering algorithms such as ......
Read more >
Spectral clustering. The intuition and math behind how it…
Clustering is a widely used unsupervised learning method. The grouping is such that points in a cluster are similar to each other, and...
Read more >
ML | Spectral Clustering - GeeksforGeeks
First, each node is assigned a row of the normalized of the Graph Laplacian Matrix. Then this data is clustered using any traditional...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found