Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

normalize_per_cell does not remove zero-expression cells from adata.raw

See original GitHub issue

When I run this snippet:

import scanpy.api as sc

adata = sc.datasets.paul15()
adata.X[0, :] = 0
adata.raw = adata.copy()

sc.pp.normalize_per_cell(adata)

print(adata.X.shape)
print(adata.raw.X.shape)

sc.pp.pca(adata)
sc.pl.pca_scatter(adata, color='Gata1')

I get:

(2729, 3451)
(2730, 3451)

and an IndexError: index 2729 is out of bounds for axis 0 with size 2729 exception. Actually X and raw.X size mismatch happens every time we filter observations. So what would be the ideal solution?

Don’t remove zero-expression cells in normalize_per_cell
Remove zero cells also from raw in normalize_per_cell
Print a warning if some observations are removed in normalize_per_cell
Whenever some observations are subsetted in adata, also subset raw.X (solution from anndata side)

Issue Analytics

State:
Created 5 years ago
Comments:11 (10 by maintainers)

Top GitHub Comments

2reactions

falexwolfcommented, Jul 23, 2018

Thank you for your thoughts!

normalize_per_cell needs to remove zero-expression cells as these can’t be normalized, the alternative would be to require it as a preprocessing step; but you’re right, wflynny, I’ll frame it as a fall-back for normalize_per_cell in the next version and output a warning… which will make things backwards compatible…
Any filtering operation on the cells/observations should also affect .raw. I’ll look into this today.
Any filtering operation on the variables should not affect .raw. I didn’t know that this gives problems in rank_genes_groups? Of course, you don’t find everything in .X that you find in .raw.X and you’ll get a key error if you try to; but is there a fundamental problem, @LuckyMD?

1reaction

LuckyMDcommented, Jul 24, 2018

Ah, yes… this was a commit I made a month ago, which I had forgotten about that fixed the issue at least in part (caddf9b5934301f9cf2048e6bb947161fd84a210).

I recall that I found it harder to fix for pl.rank_genes_groups_violin and so I left that for the time being as I felt it was a longer discussion. Let’s do this offline next week.