UserWarning: WARNING: spectral initialisation failed! The eigenvector solver failed. This is likely due to too small an eigengap. Consider adding some noise or jitter to your data. Falling back to random initialisation! warn(
See original GitHub issueWhile creating umap embeddings for HDBSCAN clustering, I am getting this user warning, `UserWarning: WARNING: spectral initialisation failed! The eigenvector solver failed. This is likely due to too small an eigengap. Consider adding some noise or jitter to your data.
Falling back to random initialisation! warn(` The clusters that are created from these embeddings have multiple duplicate clusters. Why this could be happning, need clarification.
Issue Analytics
- State:
- Created a year ago
- Comments:6 (3 by maintainers)
Top Results From Across the Web
NA returned with Warning: Embedding 8 connected ...
I suspect the spectral initialisation is failing for one reason or another. This can often happen for particularly oddly distributed data. As a...
Read more >collaborative filtering a joke 2.0
The eigenvector solver failed. This is likely due to too small an eigengap. Consider adding some noise or jitter to your data. Falling...
Read more >uwot source: R/init.R
R/init.R defines the following functions: agspectral_init ... FALSE) { if (nrow(A) < 3) { tsmessage("Graph too small, using random initialization instead") ...
Read more >Spectral Clustering - Eric Bunch
Spectral clustering should really be viewed as a graph clustering algorithm, in the sense that a data clustering problem is first translated ...
Read more >A Tutorial on Spectral Clustering
The unnormalized graph Laplacian and its eigenvalues and eigenvectors can be used to describe many properties of graphs, see Mohar (1991, 1997). One...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
UMAP generates a graph with weighted edges where the edge weights relate to relative distances of neighboring points. You can see the UMAP documentation for more detail on this process. The initialization that is failing is an attempt to find eigenvectors of the (symmetric) Laplacian of that graph. The usual technique for this is to use a power method approach as used by scipy. Scipy, in turn is relying on ARPACK for this. In practical terms all UMAP is seeing is that ARPACK returns and error when attempting to calculate the eigenvectors of the Laplacian. Usually this is related to poor convergence due to a very small gap between eigenvalues so the power method fails to separate out eigenvectors easily. That may or may not be the actual cause in your case – you would have to interrogate the specifics of the
ArpackError
to be sure.UMAP does save the graph in the
graph_
attribute, so you can actually walk through the code of the spectral layout and catch the actual error and see if that provides more information if you wish.I think the
unique=True
option is the best approach; it will find duplicates, remove them for the purpose of learning the embedding, and then place them down exactly as duplicates of their corresponding point in the embedding. So you can run UMAP with duplicates, but not break things – that was the intended use case.Of course if you want the fact that you have a lot of duplicates to matter for the learned embedding the other simple approach is to simply add a small amount of noise to the whole dataset (such that the scale of the noise is smaller than most variation among (non-duplicated) samples).