Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Seurat uses log-transformed and scaled data for analysis, Scanpy uses raw, which method is better?

See original GitHub issue

… Hello Scanpy, In @LuckyMD 's amazing paper (https://www.embopress.org/doi/full/10.15252/msb.20188746), Table 1 shows that using raw data to calculate the maker genes of clusters is the appropriate way. But the raw data was not regressed out with mitochondrial genes, gene counts, cell cycle scores…So there will be so many mito genes ranked on the top of the marker gene list. What shall we do with these mito genes, because usually they represent the dead cell-released RNA contaminations?

In Seurat, they did every downstream analysis and plotting by using the log-transformed and scaled data (see below, the scaled dots in Seurat violin plot). Scanpy draws all plots by setting use_raw=True. I’m wondering which method is better?

BTW, logFC will become negative and disappear for the marker genes of clusters when we set use_raw=False in sc.tl.rank_genes_groups(adata, 'leiden', method='wilcoxon'. Please check this https://github.com/theislab/scanpy/issues/2057.

Thanks! Best, YJ

Issue Analytics

State:
Created 2 years ago
Comments:5 (3 by maintainers)

Top GitHub Comments

1reaction

hyjforesightcommented, Apr 20, 2022

Hello @LuckyMD Thanks for the response! Could you please also check why the logFC becomes negative and disappear for the marker genes of clusters? #2057 Thanks! Best, YJ

Because adata was regressed, gene expression will become negative, cannot be loged.

0reactions

LuckyMDcommented, Apr 20, 2022

If you have to regress out covariates, maybe you could do it after log transformation? I’m not 100% sure about this approach either though.

Top Results From Across the Web

Seurat uses log-transformed and scaled data for analysis ...

Scanpy draws all plots by setting use_raw=True. I'm wondering which method is better? image. BTW, logFC will become negative and disappear for ...

scanpy_03_integration

As the stored AnnData object contains scaled data based on variable genes, we need to make a new object with the logtransformed normalized...

Scale and center the data. — ScaleData • Seurat - Satija Lab

Scales and centers features in the dataset. If variables are provided in vars.to. regress, they are individually regressed against each feature, and the ......

Current best practices in single‐cell RNA‐seq analysis: a tutorial

There is no consensus on scaling genes to 0 mean and unit variance. We prefer not to scale gene expression. Normalized data should...

Analytic Pearson residuals for normalization of single-cell ...

We demonstrate that analytic Pearson residuals strongly outperform other methods for identifying biologically variable genes, and capture more ...