question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Gene Names/Identities are showing up as numbers when plotted with highest_expr_genes()

See original GitHub issue

When processing the data in Scanpy I am unable to figure out why my plot of the Highest Expressed Genes shows up with numbers rather than gene names as the identifiers on the Y-axis.
Example: Screen Shot 2019-11-12 at 5 16 15 PM

I am worried that it may not be reading our file properly, but even when I converted from a txt file to a .loom file I still got the same problem. I am unsure if this is an issue with our original input or if something it is being read improperly. I will include some of my code below, and if there is somewhere you would like me to send copies of raw data to run it on let me know where to send it.

I am using the tutorial for Clustering on the Scanpy website as the basis for my code.

sc.settings.set_figure_params(dpi = 80)

day00a = sc.read_text("/alex_ryan/D0.1500.dge", first_column_names = True, delimiter = "\t")
day01 = sc.read_text("/alex_ryan/D1.txt.500.dge", first_column_names = True, delimiter = "\t")
day02 = sc.read_text("/alex_ryan/D2.txt.500.dge", first_column_names = True, delimiter = "\t")
day04 = sc.read_text("/alex_ryan/D4.txt.500.dge", first_column_names = True, delimiter = "\t")
day09 = sc.read_text("/alex_ryan/D9.txt.500.dge", first_column_names = True, delimiter = "\t")
day11 = sc.read_text("/alex_ryan/D11.txt.500.dge", first_column_names = True, delimiter = "\t")

day00a.obs['tech'] = 'Day 0'
day01.obs['tech'] = 'Day 1'
day02.obs['tech'] = 'Day 2'
day04.obs['tech'] = 'Day 4'
day09.obs['tech'] = 'Day 9'
day11.obs['tech'] = 'Day 11'

adata_list = [day01, day02, day04, day09, day11]
adata2 = day00a.concatenate(adata_list, join = 'outer')
adata = adata2

sc.pl.highest_expr_genes(adata, n_top = 20, save = True)

I am adding the extra .obs ‘tech’ tag so that I can identify the cells by day after they have been combined into one anndata object. I don’t think this is causing the issue, but if that is part of it, then please let me know if there is a work around.

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:10 (4 by maintainers)

github_iconTop GitHub Comments

2reactions
mosquitoCatcommented, Sep 20, 2020

@ivirshup @LuckyMD I fixed the problem - the issue was in the original h5ad file converted from a Seurat object using SeuratDisk::Convert(). It seems the var data wasn’t ported over properly for the assay I was using. I rebuilt the h5ad file using reticulate instead and that solved the problem.

0reactions
HKL-97commented, Jun 2, 2021

@ivirshup @LuckyMD I fixed the problem - the issue was in the original h5ad file converted from a Seurat object using SeuratDisk::Convert(). It seems the var data wasn’t ported over properly for the assay I was using. I rebuilt the h5ad file using reticulate instead and that solved the problem.

I have encountered the same issue as @mosquitoCat . Although adata.var_names still returns correct gene symbols, all my name IDs become numbers: for example, sc.pl.umap(adata,color=‘GeneName’) will return errors. but sc.pl.umap(adata,color=‘123’) can be recognized. SeuratDisk::Convert() seems to cause some trouble here. Is there a way to fix it? @ivirshup

Read more comments on GitHub >

github_iconTop Results From Across the Web

Gene Names/Identities are showing up as numbers ... - GitHub
When processing the data in Scanpy I am unable to figure out why my plot of the Highest Expressed Genes shows up with...
Read more >
Gene Frequently Asked Questions - NCBI Bookshelf
How does Gene represent genes spanning the origin of replication of a ... The gene name (symbol) and protein names provided in submissions ......
Read more >
How can I obtain the percentage gene expression per identity ...
This can be solved like this: library(Seurat) my_genes <- c("gene1", "gene2", "gene3") exp <- FetchData(object, my_genes) matrix ...
Read more >
FAQ - GeneSetEnrichmentAnalysisWiki - Broad Institute
3.13 Why didn't my gene set display an enrichment plot even though it is in the top hits? 4 MSigDB Gene Sets. 4.1...
Read more >
Plotting #1: Analysis Plots • scCustomize
scCustomize allows for plotting of highly variable genes with desired number of points labeled in single function. VariableFeaturePlot_scCustom() also contains ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found