Warning message: failed creating initial embedding; using random embedding instead
See original GitHub issueHello,
I am trying to perform umap on a dataset of ~5000 observations and 20 features (selected with previous PCA), using the R implementation provided by the umap
package.
When computing UMAP I get this warning message: Warning message: failed to create initial embedding; using random embedding insteadx
.
So the spectral initialization is not working.
How should I tackle this sort of instability?
Plots are quite different from each other in different runs, and I would like something reproducible and possibly robust. I imagine this could be due to my data, but I wasn’t able to find help about that in the UMAP documentation.
Feel free to close this issue if it’s not appropriate for this repo, Thank you for your kind attention
Issue Analytics
- State:
- Created 4 years ago
- Comments:5 (1 by maintainers)
Top Results From Across the Web
umap source: R/embedding.R - Rdrr.io
R defines the following functions: make.initial.spectator.embedding ... warn.msg <- c("failed creating initial embedding;", "using random embedding ...
Read more >Uniform Manifold Approximation and Projection in R
Warning : failed creating initial embedding; using random embedding instead ## Warning: failed creating initial embedding; using random embedding instead.
Read more >Uniform Manifold Approximation and Projection in R
Once we have a 'umap' object describing an embedding of a dataset into a low-dimensional layout, we can project other data onto the...
Read more >Working with Random/SplittableRandom instances in ...
Introduction Embedding instances of Random and SplittableRandom in ... and the Warnings that inform us that native-image failed to create a ...
Read more >Embedding Infinispan caches in Java applications
2.2. Creating and using embedded caches. Infinispan provides a GlobalConfigurationBuilder API that controls the Cache Manager and a ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Essentially this would mean that the internal topological representation (a graph) does not have a good spectral gap in the Laplacian; or that the spectral embedding/eigenvector solver is not working correctly on your system. In the first case that means that potentially you may want to add a small amount of noise or jitter to your data to hopefully nudge it out of this odd situation. In the second case you probably want to look at how the ARPACK components are installed.
I like the approach to experiment with the number of neighbors. Along similar lines, another experiment might be to change the input data… The 20 features in this dataset come from PCA preprocessing. Using a couple more or fewer features from that preprocessing stage should not affect the interpretation of the workflow, but might introduce that “noise or jitter” to avoid numerical problems. Not sure if this would work in practice, but it is quick to test.