Plot of ranked genes groups with non-raw data or layers
See original GitHub issueHej,
I have been trying to plot the ranked gene groups with a dotplot in this way
sc.plotting.tools.rank_genes_groups_dotplot(all_data_flt_clst, n_genes=5, save='.pdf', groupby='batch', dendrogram=False, layer='imputed')
or from non-raw data
sc.plotting.tools.rank_genes_groups_dotplot(all_data_flt_clst, n_genes=5, save='.pdf', groupby='batch', dendrogram=False, use_raw=False)
I get an error regarding the index names of the genes. Other commands for plotting gene rank groups (such as heatmap) show the same problem. The error goes as follows:
KeyError: 'Indices "[\'PRM2\', \'ACAP1\', \'SPEM1\', \'SPATA3\', \'C10orf62\', \'TNP1\', \'MIR193BHG\', \'PRM1\', \'CCDC179\', \'AC007557.1\', \'SPACA1\', \'ERICH2\', \'RP11-360D2.1\', \'TIPARP-AS1\', \'GS1-124K5.4\', \'DCN\', \'C1S\', \'SERPING1\', \'C1R\', \'SERPINF1\', \'GAGE2A\', \'PTMA\', \'HMGB1\', \'VCX2\', \'ERP29\', \'ZCWPW1\', \'SMC1B\', \'DPH7\', \'SCML1\', \'CLSPN\', \'CSAD\', \'C1QBP\', \'DNAJB6\', \'TCF3\', \'RRBP1\', \'HSP90AA1\', \'TMED10\', \'ART3\', \'BUB1\', \'KRBOX1\', \'B2M\', \'IFITM3\', \'GNG11\', \'IFITM2\', \'IFI27\', \'TYROBP\', \'S100A4\', \'FCER1G\', \'CD163\', \'CYBA\', \'CALD1\', \'IGFBP7\', \'TIMP3\', \'PTGDS\', \'TSHZ2\', \'MT-ND3\', \'MT-ND1\', \'MT-ND2\', \'MT-ATP6\', \'MT-ND4\', \'MT-ND1\', \'HMGN5\', \'MT-ND3\', \'ALDH1A1\', \'MT-ATP6\']" contain invalid observation/variables names/indices.'
I checked the var_names and obs_names of my object, but they seem totally fine. Do you have any ideas about the origin of the problem? I can post the backtracking of the error if needed 😃
Cheers, Samuele
Issue Analytics
- State:
- Created 5 years ago
- Comments:13 (4 by maintainers)
Top Results From Across the Web
Plot of ranked genes groups with non-raw data or layers #438
I get an error regarding the index names of the genes. Other commands for plotting gene rank groups (such as heatmap) show the...
Read more >scanpy.tl.rank_genes_groups — Scanpy 1.9.1 documentation
Rank genes for characterizing groups. Expects logarithmized data. Parameters. adata : AnnData. Annotated data matrix. groupby : ...
Read more >Untitled
Repellente topi, #Ballroom dancing classes long island, Auliya mere lyrics, ... 51404 rr 264, Top 100 rankings fantasy football, Taxi ica nazca?
Read more >FY2021 ENVIRONMENTAL, SOCIAL AND GOVERNANCE ...
The Group sources raw and non-raw materials. - for its formula, packaging and merchandising tools - globally, yet mainly in Europe ( 80%...
Read more >Untitled
Anakku autis ringan, Soul bands for hire in dorset, Lokhi kofta curry sanjeev ... 4d32 tube data sheet, Ogenki de, Samir okasha philosophy,...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
@SamueleSoraggi you are right, the layers contain the same genes as the adata.X matrix. I assume that in your case, you did a highly variable gene selection which affects both adata.X and adata.layers but not adata.raw.
The solution is to mark highly variable genes without removing the other less variable genes. This functionality was added some few months ago and may not be properly reflected on the documentation.
Have you subsetted your AnnData object to highly variable genes, while keeping the full dataset in
.raw
? In that case it could be that genes that are found as markers viarank_genes_groups
, are not inadata.var_names
, but only inadata.raw.var_names
and therefore cannot be found by the plotting function. I’ve previously encountered issues with this, but I thought it had been solved now.