normalize_per_cell does not remove zero-expression cells from adata.raw
See original GitHub issueWhen I run this snippet:
import scanpy.api as sc
adata = sc.datasets.paul15()
adata.X[0, :] = 0
adata.raw = adata.copy()
sc.pp.normalize_per_cell(adata)
print(adata.X.shape)
print(adata.raw.X.shape)
sc.pp.pca(adata)
sc.pl.pca_scatter(adata, color='Gata1')
I get:
(2729, 3451)
(2730, 3451)
and an IndexError: index 2729 is out of bounds for axis 0 with size 2729
exception. Actually X
and raw.X
size mismatch happens every time we filter observations. So what would be the ideal solution?
- Don’t remove zero-expression cells in
normalize_per_cell
- Remove zero cells also from
raw
innormalize_per_cell
- Print a warning if some observations are removed in
normalize_per_cell
- Whenever some observations are subsetted in adata, also subset
raw.X
(solution from anndata side)
Issue Analytics
- State:
- Created 5 years ago
- Comments:11 (10 by maintainers)
Top Results From Across the Web
normalize_per_cell does not remove zero-expression cells ...
When I run this snippet: import scanpy.api as sc adata = sc.datasets.paul15() adata.X[0, :] = 0 adata.raw = adata.copy() ...
Read more >Preprocessing and clustering 3k PBMCs - Scanpy tutorials
The data consist of 3k PBMCs from a Healthy Donor and are freely ... Remove cells that have too many mitochondrial genes expressed...
Read more >scanpy_01_qc
Before running cell cycle we have to normalize the data. In the scanpy object, the data slot will be overwritten with the normalized...
Read more >Scanpy Tutorial - 65k PBMCs - Parse Biosciences
After reading in the data we'll perform basic filtering a on our expression matrix to remove low quality cells and uninformative genes.
Read more >Introduction to scvi-tools — scvi 0.7.0 documentation
Now we preprocess the data to remove, for example, genes that are very lowly ... INFO No batch_key inputted, assuming all cells are...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Thank you for your thoughts!
normalize_per_cell
needs to remove zero-expression cells as these can’t be normalized, the alternative would be to require it as a preprocessing step; but you’re right, wflynny, I’ll frame it as a fall-back fornormalize_per_cell
in the next version and output a warning… which will make things backwards compatible….raw
. I’ll look into this today..raw
. I didn’t know that this gives problems inrank_genes_groups
? Of course, you don’t find everything in.X
that you find in.raw.X
and you’ll get a key error if you try to; but is there a fundamental problem, @LuckyMD?Ah, yes… this was a commit I made a month ago, which I had forgotten about that fixed the issue at least in part (caddf9b5934301f9cf2048e6bb947161fd84a210).
I recall that I found it harder to fix for
pl.rank_genes_groups_violin
and so I left that for the time being as I felt it was a longer discussion. Let’s do this offline next week.