question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

normalize_per_cell does not remove zero-expression cells from adata.raw

See original GitHub issue

When I run this snippet:

import scanpy.api as sc

adata = sc.datasets.paul15()
adata.X[0, :] = 0
adata.raw = adata.copy()

sc.pp.normalize_per_cell(adata)

print(adata.X.shape)
print(adata.raw.X.shape)

sc.pp.pca(adata)
sc.pl.pca_scatter(adata, color='Gata1')

I get:

(2729, 3451)
(2730, 3451)

and an IndexError: index 2729 is out of bounds for axis 0 with size 2729 exception. Actually X and raw.X size mismatch happens every time we filter observations. So what would be the ideal solution?

  • Don’t remove zero-expression cells in normalize_per_cell
  • Remove zero cells also from raw in normalize_per_cell
  • Print a warning if some observations are removed in normalize_per_cell
  • Whenever some observations are subsetted in adata, also subset raw.X (solution from anndata side)

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:11 (10 by maintainers)

github_iconTop GitHub Comments

2reactions
falexwolfcommented, Jul 23, 2018

Thank you for your thoughts!

  1. normalize_per_cell needs to remove zero-expression cells as these can’t be normalized, the alternative would be to require it as a preprocessing step; but you’re right, wflynny, I’ll frame it as a fall-back for normalize_per_cell in the next version and output a warning… which will make things backwards compatible…
  2. Any filtering operation on the cells/observations should also affect .raw. I’ll look into this today.
  3. Any filtering operation on the variables should not affect .raw. I didn’t know that this gives problems in rank_genes_groups? Of course, you don’t find everything in .X that you find in .raw.X and you’ll get a key error if you try to; but is there a fundamental problem, @LuckyMD?
1reaction
LuckyMDcommented, Jul 24, 2018

Ah, yes… this was a commit I made a month ago, which I had forgotten about that fixed the issue at least in part (caddf9b5934301f9cf2048e6bb947161fd84a210).

I recall that I found it harder to fix for pl.rank_genes_groups_violin and so I left that for the time being as I felt it was a longer discussion. Let’s do this offline next week.

Read more comments on GitHub >

github_iconTop Results From Across the Web

normalize_per_cell does not remove zero-expression cells ...
When I run this snippet: import scanpy.api as sc adata = sc.datasets.paul15() adata.X[0, :] = 0 adata.raw = adata.copy() ...
Read more >
Preprocessing and clustering 3k PBMCs - Scanpy tutorials
The data consist of 3k PBMCs from a Healthy Donor and are freely ... Remove cells that have too many mitochondrial genes expressed...
Read more >
scanpy_01_qc
Before running cell cycle we have to normalize the data. In the scanpy object, the data slot will be overwritten with the normalized...
Read more >
Scanpy Tutorial - 65k PBMCs - Parse Biosciences
After reading in the data we'll perform basic filtering a on our expression matrix to remove low quality cells and uninformative genes.
Read more >
Introduction to scvi-tools — scvi 0.7.0 documentation
Now we preprocess the data to remove, for example, genes that are very lowly ... INFO No batch_key inputted, assuming all cells are...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found