question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

AMG spectral clustering fails just after a few iterations of LOBPCG with " leading minor of the array is not positive definite"

See original GitHub issue

Description

AMG spectral clustering fails just after a few iterations of LOBPCG; see also #10715 and #11965.

I have suggested the fix is in https://github.com/scikit-learn/scikit-learn/issues/6489#issuecomment-468942449 which is tested by @dfilan in https://github.com/scikit-learn/scikit-learn/issues/6489#issuecomment-469470735

Steps/Code to Reproduce

import sklearn.cluster
import scipy.sparse as sparse
import random

random.seed(1337)

rand_int = random.randrange(1000)
num_nodes = 1000
X = sparse.rand(num_nodes, num_nodes, density = 0.1, random_state = rand_int)
upper = sparse.triu(X) - sparse.diags(X.diagonal())
sym_matrix = upper + upper.T
clustering = sklearn.cluster.SpectralClustering(n_clusters = 3,
eigen_solver = 'amg',
random_state = rand_int,
affinity = 'precomputed')
labels = clustering.fit(sym_matrix)

Expected Results

No error is thrown. The code works fine.

Actual Results

Error like: numpy.linalg.LinAlgError: 3-th leading minor of the array is not positive definite

The problem

sklearn\manifold\spectral_embedding_.py sets up the AMG preconditioning for LOBPCG in line

ml = smoothed_aggregation_solver(check_array(laplacian, 'csr'))

sklearn AMG implementation follows, without any reference, AMG preconditioning for graph Laplacian first proposed and tested in

Andrew Knyazev, Multiscale Spectral Graph Partitioning and Image Segmentation. Workshop on Algorithms for Modern Massive Data Sets; Stanford University and Yahoo! Research June 21–24, 2006 http://math.ucdenver.edu/~aknyazev/research/conf/image_segment_talk_UCDavis06/image_segment_talk_Stanford06.pdf (slide 13)

But the Laplacian matrix is always singular, having at least one zero eigenvalue, corresponding to the trivial eigenvector, which is a constant. Using a singular matrix for preconditioning may be expected to result in often random failures in LOBPCG and is not supported by the existing theory; see https://doi.org/10.1007/s10208-015-9297-1

The fix

Although undocumented in the original reference given above, we used a simple fix in https://bitbucket.org/joseroman/blopex/wiki/HypreforImageSegmentation.md which is just to shift the Laplacian’s diagonal by a positive scalar, alpha, before solving for its eigenvalues. The scalar used was alpha=1e-5, and the matrix was shifted with the MATLAB command: Mat=Mat+alpha*speye(n);

A similar approach, with alpha=1, is used, in line 323 of https://github.com/mmp2/megaman/blob/master/megaman/utils/eigendecomp.py

In https://github.com/scikit-learn/scikit-learn/issues/6489#issuecomment-469470735 @dfilan successfully tested the following in double-precision: Changing line 293 to sklearn\manifold\spectral_embedding_.py from

ml = smoothed_aggregation_solver(check_array(laplacian, 'csr'))

to

        laplacian4AMG = laplacian + 1e-10 * sparse.eye(laplacian.shape[0])
        ml = smoothed_aggregation_solver(check_array(laplacian4AMG, 'csr'))

but this creates a second matrix, to be used just for AMG preconditioning. It may be possible to just shift the whole Laplacian:

        laplacian = laplacian + 1e-10 * sparse.eye(laplacian.shape[0])
        ml = smoothed_aggregation_solver(check_array(laplacian, 'csr'))

The choice of alpha=1e-10 here should be OK in double-precision, but I would advocate alpha=1e-5 to be also safe in single-precision. Choosing an increasingly positive alpha may be expected to slow down the LOBPCG convergence, e.g., alpha=1 is probably excessively large, while the change from alpha=1e-10 to alpha=1e-5 would probably be unnoticeable.

If the Laplacian is not normalized, i.e. its diagonal is not all ones, its shift changes the eigenpairs, so the results of the spectral embedding and clustering also change depending on the value of the shift, if the whole Laplacian is shifted. If the shift is small, the changes may be unnoticeable. The safe choice is using the separate Laplacian laplacian4AMG shifted only inside the smoothed_aggregation_solver, then the shift value may only affect the convergence speed, but not the results of the spectral embedding and clustering.

Versions

All, including scikit-learn 0.21.dev0

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:11 (10 by maintainers)

github_iconTop GitHub Comments

1reaction
lobpcgcommented, Aug 14, 2019

It’s not merged so I would need to build scipy, right? might try that later 😉

No built - the lobpcg code is pure Python, so you can just copy/paste it in place of the old version

0reactions
ogriselcommented, Aug 29, 2019

I merged the AMG fix in master and as expected it fixed the original issue but not https://github.com/scikit-learn/scikit-learn/issues/13393#issuecomment-521293183 which will be tackled by scipy/scipy#10621 and maybe a backport in scikit-learn.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Spectral clustering on sparse dataset - python - Stack Overflow
linalg. linalg. LinAlgError: the leading minor of order 12 of 'b' is not positive definite. The factorization of 'b' could not be completed...
Read more >
sklearn.cluster.SpectralClustering
In practice Spectral Clustering is very useful when the structure of the individual clusters is highly non-convex, or more generally when a measure...
Read more >
Preconditioned Spectral Clustering for Stochastic Block ...
Abstract. Locally Optimal Block Preconditioned Conjugate Gradient (LOBPCG) is demonstrated to efficiently solve eigenvalue problems for ...
Read more >
Data reduction for spectral clustering to analyze high ...
A classical approach for analysing biological data is to first group individual data points based on some similarity criterion, a process known ...
Read more >
Optimizing LOBPCG: Sparse Matrix Loop and Data Transformations ...
Sparse matrix computations are widely used in iterative solvers; they are notoriously memory bound and typically yield poor performance on modern ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found