question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Some genes have much lower counts in Velocyto than 10x cellranger output

See original GitHub issue

Hi,

I’m using Velocyto on data generated from 10x genomics cellranger pipeline, and project velocity onto embeddings produced from Seurat and scanpy. I found that some of my marker genes are barely detected in Velocyto pipeline, but are detected in hundreds of cells from the cellranger output, .

For example:

'GSX-2' 
cellranger count (total 10k cells):
nonzero_cells     898.0
total_count      1233.0

Velocyto spliced count:
nonzero_cells    1.0
total_count      1.0

Velocyto un-spliced count:
nonzero_cells    0.0
total_count      0.0

Here is a plot of counts for ~20,000 genes: image

It looks like many genes are detected with far less counts in the Velocyto pipeline.

Below is the CLI code I used to produce the loom file: velocyto run10x -m ~/Data1/data/hrch38_rmsk.gtf ./14741X1 ~/Data1/data/refdata-cellranger-GRCh38-1.2.0/genes/genes.gtf

The bam files were generated using cellranger 2.0.2 using the same reference and sorted during velocyto run10x. I tried run10x without repeat mask and got similar result.

Do you know what’s the reason for the much lower gene detection in Velocyto?

Thanks! Yueqi

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Reactions:3
  • Comments:22 (2 by maintainers)

github_iconTop GitHub Comments

2reactions
jvikkulacommented, Jun 17, 2019

Hi @gioelelm and @yueqiw,

Did you manage to solve this issue? Since I’m facing rather similar issue, but for me most of the genes and not just some genes are detected with significantly lower counts in Velocyto pipeline. I calculated an average initial cell size from the matrix that cellranger produces and it is ~4126, whereas the average initial cell size after Velocyto pipeline is ~171 for spliced, ~378 for unspliced and ~17 for ambiguous. Also it feels somewhat suspicious that there are a lot more unspliced reads than spliced reads. I have all together 14 samples and the results are similar with all the samples.

On the other hand, some genes, which are not detected at all by cellranger, are very highly expressed based on the Velocyto pipeline. I also started wondering if this is an issue related to the annotation file.

I am also running Velocyto for data generated from 10x genomics cellranger pipeline: velocyto run10x -m repeat_msk.gtf 10x_sample_folder refdata-cellranger-GRCh38-1.2.0/genes/genes.gtf

I tried running without the repeat mask file, but that did not have an effect.

I would appreciate it a lot if you were able to help me with this issue. Thanks!

-Johanna

Read more comments on GitHub >

github_iconTop Results From Across the Web

CLI Usage Guide — velocyto 0.17.16 documentation
velocyto includes a shortcut to run the counting directly on one or more cellranger output folders (e.g. this is the folder containing the...
Read more >
Understanding Outputs -Software -Single Cell Gene Expression
An overview of Cell Ranger outputs for different pipelines and products is described below, with links to detailed documentation for each section. Table...
Read more >
Preprocessing choices affect RNA velocity results for ... - NCBI
The input to both velocyto and scVelo effectively consists of two gene-by-cell count matrices; one representing mRNA (“spliced”) abundances ...
Read more >
Preprocessing choices affect RNA velocity results for ... - bioRxiv
CellRanger and velocyto were not run on this data set since the ... For genes where the 'Gene' counts were higher than the...
Read more >
RNA velocity with kallisto | bus and velocyto.R - Bustools
The trick to make it faster is to only evaluate how many cells have at least x spliced and y unspliced counts at...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found