question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Seurat uses log-transformed and scaled data for analysis, Scanpy uses raw, which method is better?

See original GitHub issue

… Hello Scanpy, In @LuckyMD 's amazing paper (https://www.embopress.org/doi/full/10.15252/msb.20188746), Table 1 shows that using raw data to calculate the maker genes of clusters is the appropriate way. But the raw data was not regressed out with mitochondrial genes, gene counts, cell cycle scores…So there will be so many mito genes ranked on the top of the marker gene list. What shall we do with these mito genes, because usually they represent the dead cell-released RNA contaminations?

In Seurat, they did every downstream analysis and plotting by using the log-transformed and scaled data (see below, the scaled dots in Seurat violin plot). Scanpy draws all plots by setting use_raw=True. I’m wondering which method is better? image

BTW, logFC will become negative and disappear for the marker genes of clusters when we set use_raw=False in sc.tl.rank_genes_groups(adata, 'leiden', method='wilcoxon'. Please check this https://github.com/theislab/scanpy/issues/2057.

Thanks! Best, YJ

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
hyjforesightcommented, Apr 20, 2022

Hello @LuckyMD Thanks for the response! Could you please also check why the logFC becomes negative and disappear for the marker genes of clusters? #2057 Thanks! Best, YJ

Because adata was regressed, gene expression will become negative, cannot be loged.

0reactions
LuckyMDcommented, Apr 20, 2022

If you have to regress out covariates, maybe you could do it after log transformation? I’m not 100% sure about this approach either though.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Seurat uses log-transformed and scaled data for analysis ...
Scanpy draws all plots by setting use_raw=True. I'm wondering which method is better? image. BTW, logFC will become negative and disappear for ...
Read more >
scanpy_03_integration
As the stored AnnData object contains scaled data based on variable genes, we need to make a new object with the logtransformed normalized...
Read more >
Scale and center the data. — ScaleData • Seurat - Satija Lab
Scales and centers features in the dataset. If variables are provided in vars.to. regress, they are individually regressed against each feature, and the ......
Read more >
Current best practices in single‐cell RNA‐seq analysis: a tutorial
There is no consensus on scaling genes to 0 mean and unit variance. We prefer not to scale gene expression. Normalized data should...
Read more >
Analytic Pearson residuals for normalization of single-cell ...
We demonstrate that analytic Pearson residuals strongly outperform other methods for identifying biologically variable genes, and capture more ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found