question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Normalizing UMAP with technical variable

See original GitHub issue

How to normalize UMAP with a technical variable causing bias?

Data I am analyzing ~2000 variables extracted with convolutional neural networks (=transfer learning).

Workflow In order for this to work, the images of different size are inserted in the foreground of a larger black image. Hence, the original image ratio is kept unchanged. The data is then analyzed with UMAP for dimension reduction.

Problem However, I notice that one major driver for the dimension reduction is the differences in original image size (likely the proportion of the image foreground to the black background). I wonder if I could somehow normalize or regress out the original image size (=area) but could not find anything in the package documentation.

Nice to know For comparison, this is implemented in the Seurat package used for analysis of single-cell RNAseq data as ScaleData(marrow, vars.to.regress = c("S.Score", "G2M.Score"), features = rownames(marrow))

Any suggestions?

Thanks in advance!

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:5 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
obruckcommented, Jan 31, 2021

Thanks @lmcinnes, I appreciate your feedback although this is not a software issue. Perhaps something to be implemented in future updates 😃

If I understood you correctly, you suggest

  1. PCA
  2. Remove PCs correlating with image area
  3. UMAP for the remaining PCs

As visual features are non-parametric data, shouldn’t UMAP be preferred over PCA dimension reduction?

I also looked up the documentation for the Seurat package, and the vars.to.regress indicates which confounding variables should be tested for with regression analysis to reduce their effect on the primary features. In practice feature1 ~ confounding factor1 + confounding factor2 + etc The regression residuals are kept. I will implement a custom regression solution and compare that to the PCA step.

0reactions
lmcinnescommented, Feb 2, 2021

I suspect it will be very hard to remove it entirely. I’m sorry I can’t offer better solutions.

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to Use UMAP — umap 0.5 documentation
UMAP is a general purpose manifold learning and dimension reduction algorithm. It is designed to be compatible with scikit-learn, making use of the...
Read more >
How Exactly UMAP Works. And why exactly it is better than tSNE
UMAP does not apply normalization to either high- or low-dimensional probabilities, which is very different from tSNE and feels weird. However, ...
Read more >
Decrypting Dimensionality Reduction | Analytics Vidhya
UMAP does not apply normalization to either high- or low-dimensional probabilities, which is very different from tSNE and feels weird. However, ...
Read more >
Dimensionality reduction by UMAP reinforces sample ...
For PCA, we standardize the sample vectors by removing the mean and scaling to unit variance. For MDS, t-SNE (openTSNE implementation) and UMAP...
Read more >
Dimension Reduction PCA, tSNE, UMAP
(and somewhat technical) changes. • Practical outcome is: – UMAP ... UMAP skips a normalisation step in the calculation of high ... Variable...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found