add multibatchPCA approach?
See original GitHub issue- [ X] New analysis tool: A simple analysis tool you have been using and are missing in
sc.tools
?
It may be useful to adopt a PCA option similar to multiBatchPCA
in the R batchelor package.
This is a useful approach where there are imbalances in batch size and PCA is conducted across a merged experiment.
It is pretty slow in R.
From their documentation:
“Our approach is to effectively weight the cells in each batch to mimic the situation where all batches have the same number of cells. This ensures that the low-dimensional space can distinguish subpopulations in smaller batches. Otherwise, batches with a large number of cells would dominate the PCA, i.e., the definition of the mean vector and covariance matrix. This may reduce resolution of unique subpopulations in smaller batches that differ in a different dimension to the subspace of the larger batches.”
Issue Analytics
- State:
- Created 3 years ago
- Comments:9 (5 by maintainers)
Hi @LuckyMD Thank you for the fast reply. Yes to FastMNN, as I understand from using align_cds – when you specify discretely what you want to remove e.g. sample-sample variation it calls FastMNN from batchelor. Thanks for the recommendation – I will check out Scanorama, been meaning to read the review on integration techniques.
Ahh okay, I misunderstood the process then – my understanding was that some of the mnn correction would be carried over when performing velocity analysis. I will check out the scvelo forum for info on comparing samples.
Thank you.
Hi @r-reeves, Maybe this is indeed a separate issue.
mnnpy
is indeed working on the gene expression matrix, and not on a low dimensional embedding likeFastMNN
(which is what I assume you might have been using?). You could try Scanorama which is a method similar to FastMNN, using a sped up algorithm and no iterative merging of batches, but a method they call “panoramic stitching”. It has performed quite well in our benchmark of data integration methods, and is in the scanpy ecosystem and therefore should work seamlessly in a Scanpy workflow.All of this being said, you will only get an integrated graph structure with this for scvelo, which may help a little, but won’t remove the batch effect for RNA velocity calculation. scvelo doesn’t currently have any batch removal in its pipeline as it is quite difficult to add as it works directly from the normalized count data and fits a model to these. @VolkerBergen has been thinking a bit about how to perform batch correction in an scvelo model, maybe he could chime in, or you could post an issue in the scvelo repo.