[Feature Request] Allow support for "precomputed" distance matrix for umap.umap_.fuzzy_simplicial_set
See original GitHub issueI’ve been working a lot with precomputed distance matrices lately. The option to use these precomputed distances in umap.umap_.fuzzy_simplicial_set
would be really helpful.
Is this possible with any current hacks? If not, could this be possible to implement in future versions?
Issue Analytics
- State:
- Created 2 years ago
- Comments:11 (2 by maintainers)
Top Results From Across the Web
umap.umap_ — umap 0.5 documentation - Read the Docs
That is, this is similar to knn-distance but allows continuous k values rather ... of each local fuzzy simplicial set -- this is...
Read more >umap/umap_.py at master · lmcinnes/umap - GitHub
The data to be modelled as a fuzzy simplicial set. n_neighbors: int. The number of neighbors to use to approximate geodesic distance.
Read more >umap Documentation - Read the Docs
first write a short utility function that can fit the data with UMAP given a set of parameter choices, and plot the result....
Read more >UMAP API Guide — umap 0.3 documentation
The effective minimum distance between embedded points. Smaller values will result in a more clustered/clumped embedding where nearby points on the manifold are ......
Read more >Dimensionality Reduction with UMAP - R-Project.org
A sparse matrix is interpreted as a distance matrix, and is assumed to be symmetric, ... Each metric calculation results in a separate...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@lmcinnes Thank you, this is an extremely useful explanation (I also didn’t know it was that easy to use
@numb.njit()
). I’m working on a wrapper around youfuzzy_simplical_set
to use with my code and it’s helpful knowing how theX
,knn_indices
,knn_dists
, andangular
arguments are used. Looking forward to apply this to some microbiome and sequencing datasets.As far as the aitchison distance, yes doing CLR transform followed by Euclidean is definitely the most computational efficient way AFAIK. However, doing things like
variance log-ratio
orrho proportionality
is less straight forward so the@numba.njit()
support will be extremely useful.If
knn_indices
andknn_dists
are specified (and notNone
) thenX
will be ignored and theknn_indices
andknn_dists
will be used directly. So you can either not specify the indices and dists and provide anX
(which can be a feature matrix, or, ifmetric="precomputed"
, a distance matrix), or just directly specify the indices and dists and use those.The
set_op_mix_ratio
andlocal_connectivity
are relevant for symmetrization so they will be used regardless of the choice of input. In contrastangular
is about what kinds of trees to use for nearest neighbour approximation – it will only matter if you specifyX
as a feature matrix.Lastly, looking through all of this now, it is worth noting that the
metric
parameter can also be a (numba jitted) python function specifying how to compute a distance between two vectors. Unless you have sparse data (and you can’t really have sparse data and use Aitchison distance due to zeros) this should be straightforward (distances on sparse data involve more understanding of the sparse data formats to write). So, for example, you could haveIn practice I think I would apply a little algebra and rewrite the distance computation for greater numerical stability (taking the log of a geometric mean, for example, could be computed better), but I wanted the computation to be relatively clear. Given the nature of the distance computation, however, I think you could just as well do: