Normalizing UMAP with technical variable
See original GitHub issueHow to normalize UMAP with a technical variable causing bias?
Data I am analyzing ~2000 variables extracted with convolutional neural networks (=transfer learning).
Workflow In order for this to work, the images of different size are inserted in the foreground of a larger black image. Hence, the original image ratio is kept unchanged. The data is then analyzed with UMAP for dimension reduction.
Problem However, I notice that one major driver for the dimension reduction is the differences in original image size (likely the proportion of the image foreground to the black background). I wonder if I could somehow normalize or regress out the original image size (=area) but could not find anything in the package documentation.
Nice to know
For comparison, this is implemented in the Seurat package used for analysis of single-cell RNAseq data as
ScaleData(marrow, vars.to.regress = c("S.Score", "G2M.Score"), features = rownames(marrow))
Any suggestions?
Thanks in advance!
Issue Analytics
- State:
- Created 3 years ago
- Comments:5 (2 by maintainers)
Top GitHub Comments
Thanks @lmcinnes, I appreciate your feedback although this is not a software issue. Perhaps something to be implemented in future updates 😃
If I understood you correctly, you suggest
As visual features are non-parametric data, shouldn’t UMAP be preferred over PCA dimension reduction?
I also looked up the documentation for the Seurat package, and the
vars.to.regress
indicates which confounding variables should be tested for with regression analysis to reduce their effect on the primary features. In practicefeature1 ~ confounding factor1 + confounding factor2 + etc
The regression residuals are kept. I will implement a custom regression solution and compare that to the PCA step.I suspect it will be very hard to remove it entirely. I’m sorry I can’t offer better solutions.