Some genes have much lower counts in Velocyto than 10x cellranger output
See original GitHub issueHi,
I’m using Velocyto on data generated from 10x genomics cellranger pipeline, and project velocity onto embeddings produced from Seurat and scanpy. I found that some of my marker genes are barely detected in Velocyto pipeline, but are detected in hundreds of cells from the cellranger output, .
For example:
'GSX-2'
cellranger count (total 10k cells):
nonzero_cells 898.0
total_count 1233.0
Velocyto spliced count:
nonzero_cells 1.0
total_count 1.0
Velocyto un-spliced count:
nonzero_cells 0.0
total_count 0.0
Here is a plot of counts for ~20,000 genes:
It looks like many genes are detected with far less counts in the Velocyto pipeline.
Below is the CLI code I used to produce the loom file:
velocyto run10x -m ~/Data1/data/hrch38_rmsk.gtf ./14741X1 ~/Data1/data/refdata-cellranger-GRCh38-1.2.0/genes/genes.gtf
The bam files were generated using cellranger 2.0.2 using the same reference and sorted during velocyto run10x. I tried run10x without repeat mask and got similar result.
Do you know what’s the reason for the much lower gene detection in Velocyto?
Thanks! Yueqi
Issue Analytics
- State:
- Created 5 years ago
- Reactions:3
- Comments:22 (2 by maintainers)
Top GitHub Comments
Hi @gioelelm and @yueqiw,
Did you manage to solve this issue? Since I’m facing rather similar issue, but for me most of the genes and not just some genes are detected with significantly lower counts in Velocyto pipeline. I calculated an average initial cell size from the matrix that cellranger produces and it is ~4126, whereas the average initial cell size after Velocyto pipeline is ~171 for spliced, ~378 for unspliced and ~17 for ambiguous. Also it feels somewhat suspicious that there are a lot more unspliced reads than spliced reads. I have all together 14 samples and the results are similar with all the samples.
On the other hand, some genes, which are not detected at all by cellranger, are very highly expressed based on the Velocyto pipeline. I also started wondering if this is an issue related to the annotation file.
I am also running Velocyto for data generated from 10x genomics cellranger pipeline:
velocyto run10x -m repeat_msk.gtf 10x_sample_folder refdata-cellranger-GRCh38-1.2.0/genes/genes.gtf
I tried running without the repeat mask file, but that did not have an effect.
I would appreciate it a lot if you were able to help me with this issue. Thanks!
-Johanna
@sandrav-CGEN check this https://github.com/theislab/scvelo/discussions/813