Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Normalizing UMAP with technical variable

See original GitHub issue

How to normalize UMAP with a technical variable causing bias?

Data I am analyzing ~2000 variables extracted with convolutional neural networks (=transfer learning).

Workflow In order for this to work, the images of different size are inserted in the foreground of a larger black image. Hence, the original image ratio is kept unchanged. The data is then analyzed with UMAP for dimension reduction.

Problem However, I notice that one major driver for the dimension reduction is the differences in original image size (likely the proportion of the image foreground to the black background). I wonder if I could somehow normalize or regress out the original image size (=area) but could not find anything in the package documentation.

Nice to know For comparison, this is implemented in the Seurat package used for analysis of single-cell RNAseq data as ScaleData(marrow, vars.to.regress = c("S.Score", "G2M.Score"), features = rownames(marrow))

Any suggestions?

Thanks in advance!

Issue Analytics

State:
Created 3 years ago
Comments:5 (2 by maintainers)

Top GitHub Comments

1reaction

obruckcommented, Jan 31, 2021

Thanks @lmcinnes, I appreciate your feedback although this is not a software issue. Perhaps something to be implemented in future updates 😃

If I understood you correctly, you suggest

PCA
Remove PCs correlating with image area
UMAP for the remaining PCs

As visual features are non-parametric data, shouldn’t UMAP be preferred over PCA dimension reduction?

I also looked up the documentation for the Seurat package, and the vars.to.regress indicates which confounding variables should be tested for with regression analysis to reduce their effect on the primary features. In practice feature1 ~ confounding factor1 + confounding factor2 + etc The regression residuals are kept. I will implement a custom regression solution and compare that to the PCA step.

0reactions

lmcinnescommented, Feb 2, 2021

I suspect it will be very hard to remove it entirely. I’m sorry I can’t offer better solutions.

Top Results From Across the Web

How to Use UMAP — umap 0.5 documentation

UMAP is a general purpose manifold learning and dimension reduction algorithm. It is designed to be compatible with scikit-learn, making use of the...

How Exactly UMAP Works. And why exactly it is better than tSNE

UMAP does not apply normalization to either high- or low-dimensional probabilities, which is very different from tSNE and feels weird. However, ...

Decrypting Dimensionality Reduction | Analytics Vidhya

UMAP does not apply normalization to either high- or low-dimensional probabilities, which is very different from tSNE and feels weird. However, ...

Dimensionality reduction by UMAP reinforces sample ...

For PCA, we standardize the sample vectors by removing the mean and scaling to unit variance. For MDS, t-SNE (openTSNE implementation) and UMAP...

Dimension Reduction PCA, tSNE, UMAP

(and somewhat technical) changes. • Practical outcome is: – UMAP ... UMAP skips a normalisation step in the calculation of high ... Variable...