Are umap transformations non-deterministic?
See original GitHub issueI am trying to use umap to preprocess some data, and I’ve noticed that the same vector gives a different result according to the number of rows that is being passed to the transformation.
i.e the same row vector A outputs different vector according to the shape of the data (# of rows) being transformed.
# fit umap to data X
reducer = umap.UMAP().fit(X)
# transform X using reducer
embedding = reducer.transform(X)
# get subset of X to transform
embedding_sub = reducer.transform(X[:100,:])
# => I was assuming embedding_sub == embedding[:100, :]
# => but that wasn't the case
Issue Analytics
- State:
- Created 5 years ago
- Reactions:2
- Comments:17 (5 by maintainers)
Top Results From Across the Web
UMAP Reproducibility — umap 0.5 documentation
UMAP is a stochastic algorithm – it makes use of randomness both to speed up approximation steps, and to aid in solving hard...
Read more >tSNE vs. UMAP: Global Structure - Towards Data Science
Being initialized with PCA or Graph Laplacian, tSNE becomes a deterministic method. In contrast, UMAP keeps its stochasticity even being ...
Read more >Intuitive explanation of how UMAP works, compared to t-SNE
With UMAP, you should be able to interpret both the distances between / positions of points and clusters. Both algorithms are highly stochastic...
Read more >UMAP Based Anomaly Detection for Minimal Residual ... - NCBI
Keywords: acute myeloid leukemia, anomaly detection, UMAP, set-transformer ... and UMAP as well as HDBSCAN are non-deterministic algorithms, ...
Read more >Understanding UMAP
It's also notable that t-SNE projections vary widely from run to run, with different pieces of the higher-dimensional data projected to different locations....
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Hey all, I thought I’d add some analysis to this discussion via colab here.
There I do quick quantification of the randomness introduced and explore its impact on some downstream tasks. You can find conclusions below (also in notebook):
I would like to add to @dataist’s experiments that we get different result for the same point also if: