question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Question] Is it safe to fit-transform once to get multiple embeddings?

See original GitHub issue

Quick question, a bit of the mathsy side.

Suppose I want to test different embeddings of my data, say in 2, 5 and 10 dimensions; is the UMAP algorithm amenable to fit only once using n_components=10 and then take the first 2 or 5 components, or would this be complete nonsense and I should make separate fittings?

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:5 (1 by maintainers)

github_iconTop GitHub Comments

1reaction
adelejacksoncommented, May 8, 2019

(Caveat: I’m not Leland McInnes; I’m a mathematician and am comfortable with the maths behind UMAP but am not familiar with the code. Take this with a grain of salt.)

I wouldn’t expect this to work. Unlike in PCA, the coordinate system in the transformed data has no particular meaning – you get the same cost for any isometry of a given representation (in particular, for a rotation). (There’s also no reason the first two components would work any better than the last two.)

EDIT: The following paragraph is wrong. For example, say you have a 10 dimensional dataset. You can “transform” this under UMAP and you should get basically the same dataset back (up to an isometry, I think). Taking two components of this “transformation”, in general we certainly do not get the embedding you would get with n_components=2 – at least, I would hope not, or you could just restrict to two components instead of running UMAP!

0reactions
adelejacksoncommented, May 8, 2019

I don’t think, however, that you get the “same” dataset up to isometry when n_components = n_features; from what I understood from the paper the umap algo works with local distances, and it attempts to make a reconstruction that preserve the simplicial structure, i.e. it should preserve the topology of the dataset, not its metric.

Yep, you’re completely correct; I forgot that we use the Euclidean metric for the low-dimensional representation, not the knn-weighted one.

Read more comments on GitHub >

github_iconTop Results From Across the Web

fit_transform of combined umap · Discussion #623 - GitHub
Is there a way to get embeddings of combined umaps either union or intersection for downstream actions such as hdbscan?
Read more >
FAQ - BERTopic
No. By using document embeddings there is typically no need to preprocess the data as all parts of a document are important in...
Read more >
Transforming New Data with UMAP - Read the Docs
The next important question is what the transform did to our test data. In principle we have a new two dimensional representation of...
Read more >
UMAP: is fit_transform result the same as .embedding_ ...
I keep getting different plots for both. I'm confused because I thought they were doing the same thing. python · dimensionality-reduction.
Read more >
python - Bertopic with embedding: unable to use find_topic
I get an error message indicating that I'm using embedding (which is true). I need to instantiate before I can fit_transform the model...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found