Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

UserWarning: WARNING: spectral initialisation failed! The eigenvector solver failed. This is likely due to too small an eigengap. Consider adding some noise or jitter to your data. Falling back to random initialisation! warn(

See original GitHub issue

While creating umap embeddings for HDBSCAN clustering, I am getting this user warning, `UserWarning: WARNING: spectral initialisation failed! The eigenvector solver failed. This is likely due to too small an eigengap. Consider adding some noise or jitter to your data.

Falling back to random initialisation! warn(` The clusters that are created from these embeddings have multiple duplicate clusters. Why this could be happning, need clarification.

Issue Analytics

State:
Created a year ago
Comments:6 (3 by maintainers)

Top GitHub Comments

2reactions

lmcinnescommented, Aug 3, 2022

UMAP generates a graph with weighted edges where the edge weights relate to relative distances of neighboring points. You can see the UMAP documentation for more detail on this process. The initialization that is failing is an attempt to find eigenvectors of the (symmetric) Laplacian of that graph. The usual technique for this is to use a power method approach as used by scipy. Scipy, in turn is relying on ARPACK for this. In practical terms all UMAP is seeing is that ARPACK returns and error when attempting to calculate the eigenvectors of the Laplacian. Usually this is related to poor convergence due to a very small gap between eigenvalues so the power method fails to separate out eigenvectors easily. That may or may not be the actual cause in your case – you would have to interrogate the specifics of the ArpackError to be sure.

UMAP does save the graph in the graph_ attribute, so you can actually walk through the code of the spectral layout and catch the actual error and see if that provides more information if you wish.

1reaction

lmcinnescommented, Aug 3, 2022

I think the unique=True option is the best approach; it will find duplicates, remove them for the purpose of learning the embedding, and then place them down exactly as duplicates of their corresponding point in the embedding. So you can run UMAP with duplicates, but not break things – that was the intended use case.

Of course if you want the fact that you have a lot of duplicates to matter for the learned embedding the other simple approach is to simply add a small amount of noise to the whole dataset (such that the scale of the noise is smaller than most variation among (non-duplicated) samples).

Top Results From Across the Web

NA returned with Warning: Embedding 8 connected ...

I suspect the spectral initialisation is failing for one reason or another. This can often happen for particularly oddly distributed data. As a...

collaborative filtering a joke 2.0

The eigenvector solver failed. This is likely due to too small an eigengap. Consider adding some noise or jitter to your data. Falling...

uwot source: R/init.R

R/init.R defines the following functions: agspectral_init ... FALSE) { if (nrow(A) < 3) { tsmessage("Graph too small, using random initialization instead") ...

Spectral Clustering - Eric Bunch

Spectral clustering should really be viewed as a graph clustering algorithm, in the sense that a data clustering problem is first translated ...

A Tutorial on Spectral Clustering

The unnormalized graph Laplacian and its eigenvalues and eigenvectors can be used to describe many properties of graphs, see Mohar (1991, 1997). One...