Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

add multibatchPCA approach?

See original GitHub issue

[ X] New analysis tool: A simple analysis tool you have been using and are missing in sc.tools?

It may be useful to adopt a PCA option similar to multiBatchPCA in the R batchelor package. This is a useful approach where there are imbalances in batch size and PCA is conducted across a merged experiment. It is pretty slow in R.

From their documentation:

“Our approach is to effectively weight the cells in each batch to mimic the situation where all batches have the same number of cells. This ensures that the low-dimensional space can distinguish subpopulations in smaller batches. Otherwise, batches with a large number of cells would dominate the PCA, i.e., the definition of the mean vector and covariance matrix. This may reduce resolution of unique subpopulations in smaller batches that differ in a different dimension to the subspace of the larger batches.”

Issue Analytics

State:
Created 3 years ago
Comments:9 (5 by maintainers)

Top GitHub Comments

1reaction

r-reevescommented, Nov 30, 2020

Hi @LuckyMD Thank you for the fast reply. Yes to FastMNN, as I understand from using align_cds – when you specify discretely what you want to remove e.g. sample-sample variation it calls FastMNN from batchelor. Thanks for the recommendation – I will check out Scanorama, been meaning to read the review on integration techniques.

you will only get an integrated graph structure with this for scvelo, which may help a little, but won’t remove the batch effect for RNA velocity calculation. scvelo doesn’t currently have any batch removal in its pipeline as it is quite difficult to add as it works directly from the normalized count data and fits a model to these.

Ahh okay, I misunderstood the process then – my understanding was that some of the mnn correction would be carried over when performing velocity analysis. I will check out the scvelo forum for info on comparing samples.

Thank you.

0reactions

LuckyMDcommented, Nov 26, 2020

Hi @r-reeves, Maybe this is indeed a separate issue. mnnpy is indeed working on the gene expression matrix, and not on a low dimensional embedding like FastMNN (which is what I assume you might have been using?). You could try Scanorama which is a method similar to FastMNN, using a sped up algorithm and no iterative merging of batches, but a method they call “panoramic stitching”. It has performed quite well in our benchmark of data integration methods, and is in the scanpy ecosystem and therefore should work seamlessly in a Scanpy workflow.

All of this being said, you will only get an integrated graph structure with this for scvelo, which may help a little, but won’t remove the batch effect for RNA velocity calculation. scvelo doesn’t currently have any batch removal in its pipeline as it is quite difficult to add as it works directly from the normalized count data and fits a model to these. @VolkerBergen has been thinking a bit about how to perform batch correction in an scvelo model, maybe he could chime in, or you could post an issue in the scvelo repo.

Top Results From Across the Web

add multibatchPCA approach? · Issue #1289 · scverse/scanpy

This is a useful approach where there are imbalances in batch size and PCA is conducted across a merged experiment. It is pretty...

multiBatchPCA: Multi-batch PCA in batchelor - Rdrr.io

Our approach is to effectively weight the cells in each batch to mimic the situation where all batches have the same number of...

Correcting batch effects in single-cell RNA-seq ... - Bioconductor

7.2 Multi-batch PCA Users can perform a PCA across multiple batches using the multiBatchPCA() function. The output of this function is roughly equivalent...

geneBasis: an iterative approach for ... - Genome Biology

We introduce an iterative approach, geneBasis, for selecting an optimal gene panel, where each newly added gene captures the maximum ...

Integration of spatial and single-cell transcriptomic data ...

Our method provides an approach for studying cell fate decisions in ... we manually added genes of interest (especially transcription ...