Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Plot of ranked genes groups with non-raw data or layers

See original GitHub issue

Hej,

I have been trying to plot the ranked gene groups with a dotplot in this way

sc.plotting.tools.rank_genes_groups_dotplot(all_data_flt_clst, n_genes=5, save='.pdf', groupby='batch', dendrogram=False, layer='imputed')

or from non-raw data

sc.plotting.tools.rank_genes_groups_dotplot(all_data_flt_clst, n_genes=5, save='.pdf', groupby='batch', dendrogram=False, use_raw=False)

I get an error regarding the index names of the genes. Other commands for plotting gene rank groups (such as heatmap) show the same problem. The error goes as follows:

KeyError: 'Indices "[\'PRM2\', \'ACAP1\', \'SPEM1\', \'SPATA3\', \'C10orf62\', \'TNP1\', \'MIR193BHG\', \'PRM1\', \'CCDC179\', \'AC007557.1\', \'SPACA1\', \'ERICH2\', \'RP11-360D2.1\', \'TIPARP-AS1\', \'GS1-124K5.4\', \'DCN\', \'C1S\', \'SERPING1\', \'C1R\', \'SERPINF1\', \'GAGE2A\', \'PTMA\', \'HMGB1\', \'VCX2\', \'ERP29\', \'ZCWPW1\', \'SMC1B\', \'DPH7\', \'SCML1\', \'CLSPN\', \'CSAD\', \'C1QBP\', \'DNAJB6\', \'TCF3\', \'RRBP1\', \'HSP90AA1\', \'TMED10\', \'ART3\', \'BUB1\', \'KRBOX1\', \'B2M\', \'IFITM3\', \'GNG11\', \'IFITM2\', \'IFI27\', \'TYROBP\', \'S100A4\', \'FCER1G\', \'CD163\', \'CYBA\', \'CALD1\', \'IGFBP7\', \'TIMP3\', \'PTGDS\', \'TSHZ2\', \'MT-ND3\', \'MT-ND1\', \'MT-ND2\', \'MT-ATP6\', \'MT-ND4\', \'MT-ND1\', \'HMGN5\', \'MT-ND3\', \'ALDH1A1\', \'MT-ATP6\']" contain invalid observation/variables names/indices.'

I checked the var_names and obs_names of my object, but they seem totally fine. Do you have any ideas about the origin of the problem? I can post the backtracking of the error if needed 😃

Cheers, Samuele

Issue Analytics

State:
Created 5 years ago
Comments:13 (4 by maintainers)

Top GitHub Comments

3reactions

fidelramcommented, Feb 1, 2019

@SamueleSoraggi you are right, the layers contain the same genes as the adata.X matrix. I assume that in your case, you did a highly variable gene selection which affects both adata.X and adata.layers but not adata.raw.

The solution is to mark highly variable genes without removing the other less variable genes. This functionality was added some few months ago and may not be properly reflected on the documentation.

1reaction

LuckyMDcommented, Jan 23, 2019

Have you subsetted your AnnData object to highly variable genes, while keeping the full dataset in .raw? In that case it could be that genes that are found as markers via rank_genes_groups, are not in adata.var_names, but only in adata.raw.var_names and therefore cannot be found by the plotting function. I’ve previously encountered issues with this, but I thought it had been solved now.