question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

UserWarning: WARNING: spectral initialisation failed! The eigenvector solver failed. This is likely due to too small an eigengap. Consider adding some noise or jitter to your data. Falling back to random initialisation! warn(

See original GitHub issue

While creating umap embeddings for HDBSCAN clustering, I am getting this user warning, `UserWarning: WARNING: spectral initialisation failed! The eigenvector solver failed. This is likely due to too small an eigengap. Consider adding some noise or jitter to your data.

Falling back to random initialisation! warn(` The clusters that are created from these embeddings have multiple duplicate clusters. Why this could be happning, need clarification.

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:6 (3 by maintainers)

github_iconTop GitHub Comments

2reactions
lmcinnescommented, Aug 3, 2022

UMAP generates a graph with weighted edges where the edge weights relate to relative distances of neighboring points. You can see the UMAP documentation for more detail on this process. The initialization that is failing is an attempt to find eigenvectors of the (symmetric) Laplacian of that graph. The usual technique for this is to use a power method approach as used by scipy. Scipy, in turn is relying on ARPACK for this. In practical terms all UMAP is seeing is that ARPACK returns and error when attempting to calculate the eigenvectors of the Laplacian. Usually this is related to poor convergence due to a very small gap between eigenvalues so the power method fails to separate out eigenvectors easily. That may or may not be the actual cause in your case – you would have to interrogate the specifics of the ArpackError to be sure.

UMAP does save the graph in the graph_ attribute, so you can actually walk through the code of the spectral layout and catch the actual error and see if that provides more information if you wish.

1reaction
lmcinnescommented, Aug 3, 2022

I think the unique=True option is the best approach; it will find duplicates, remove them for the purpose of learning the embedding, and then place them down exactly as duplicates of their corresponding point in the embedding. So you can run UMAP with duplicates, but not break things – that was the intended use case.

Of course if you want the fact that you have a lot of duplicates to matter for the learned embedding the other simple approach is to simply add a small amount of noise to the whole dataset (such that the scale of the noise is smaller than most variation among (non-duplicated) samples).

Read more comments on GitHub >

github_iconTop Results From Across the Web

NA returned with Warning: Embedding 8 connected ...
I suspect the spectral initialisation is failing for one reason or another. This can often happen for particularly oddly distributed data. As a...
Read more >
collaborative filtering a joke 2.0
The eigenvector solver failed. This is likely due to too small an eigengap. Consider adding some noise or jitter to your data. Falling...
Read more >
uwot source: R/init.R
R/init.R defines the following functions: agspectral_init ... FALSE) { if (nrow(A) < 3) { tsmessage("Graph too small, using random initialization instead") ...
Read more >
Spectral Clustering - Eric Bunch
Spectral clustering should really be viewed as a graph clustering algorithm, in the sense that a data clustering problem is first translated ...
Read more >
A Tutorial on Spectral Clustering
The unnormalized graph Laplacian and its eigenvalues and eigenvectors can be used to describe many properties of graphs, see Mohar (1991, 1997). One...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found