question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

problem with naming of fragments with mouse index build

See original GitHub issue

Hi, I made an index using the recent posted solution for the mouse index (with Kallisto), but when try to run loom fromfq on my fastq files, it get this error:

2020-11-24 18:53:33,384 - INFO - Using 25 threads.
2020-11-24 18:53:33,386 - INFO - kallisto bus -i /staging/leuven/stg_00072/reference/mm10/gencode.M23.GRCm38.p6/transcripts/kallisto_test/gencode.vM23.fragments.idx -o /local_scratch/tmp-vsc32808/tmpxem7z4gn -x 10xv3 -t 25 /staging/leuven/stg_00072/Ben/antNPCs/10xGenomics_F1Tm1/WT_day3/fastqFiles/200903.NovaSeq1.FCB/GC1002925_H6_S10_L001_R1_001.fastq.gz /staging/leuven/stg_00072/Ben/antNPCs/10xGenomics_F1Tm1/WT_day3/fastqFiles/200903.NovaSeq1.FCB/GC1002925_H6_S10_L001_R2_001.fastq.gz /staging/leuven/stg_00072/Ben/antNPCs/10xGenomics_F1Tm1/WT_day3/fastqFiles/200903.NovaSeq1.FCB/GC1002925_H6_S10_L002_R1_001.fastq.gz /staging/leuven/stg_00072/Ben/antNPCs/10xGenomics_F1Tm1/WT_day3/fastqFiles/200903.NovaSeq1.FCB/GC1002925_H6_S10_L002_R2_001.fastq.gz /staging/leuven/stg_00072/Ben/antNPCs/10xGenomics_F1Tm1/WT_day3/fastqFiles/200923.NovaSeq1.FCA/GC1002925_H6_S10_L001_R1_001.fastq.gz /staging/leuven/stg_00072/Ben/antNPCs/10xGenomics_F1Tm1/WT_day3/fastqFiles/200923.NovaSeq1.FCA/GC1002925_H6_S10_L001_R2_001.fastq.gz /staging/leuven/stg_00072/Ben/antNPCs/10xGenomics_F1Tm1/WT_day3/fastqFiles/200923.NovaSeq1.FCA/GC1002925_H6_S10_L002_R1_001.fastq.gz /staging/leuven/stg_00072/Ben/antNPCs/10xGenomics_F1Tm1/WT_day3/fastqFiles/200923.NovaSeq1.FCA/GC1002925_H6_S10_L002_R2_001.fastq.gz
2020-11-24 18:53:33,468 - INFO - [index] k-mer length: 31
2020-11-24 18:53:33,468 - INFO - [index] number of targets: 374,274
2020-11-24 18:53:33,468 - INFO - [index] number of k-mers: 142,414,483
2020-11-24 18:53:47,927 - INFO - [index] number of equivalence classes: 1,093,834
2020-11-24 18:53:51,330 - INFO - [quant] will process sample 1: /staging/leuven/stg_00072/Ben/antNPCs/10xGenomics_F1Tm1/WT_day3/fastqFiles/200903.NovaSeq1.FCB/GC1002925_H6_S10_L001_R1_001.fastq.gz
2020-11-24 18:53:51,330 - INFO -                                /staging/leuven/stg_00072/Ben/antNPCs/10xGenomics_F1Tm1/WT_day3/fastqFiles/200903.NovaSeq1.FCB/GC1002925_H6_S10_L001_R2_001.fastq.gz
2020-11-24 18:53:51,330 - INFO - [quant] will process sample 2: /staging/leuven/stg_00072/Ben/antNPCs/10xGenomics_F1Tm1/WT_day3/fastqFiles/200903.NovaSeq1.FCB/GC1002925_H6_S10_L002_R1_001.fastq.gz
2020-11-24 18:53:51,330 - INFO -                                /staging/leuven/stg_00072/Ben/antNPCs/10xGenomics_F1Tm1/WT_day3/fastqFiles/200903.NovaSeq1.FCB/GC1002925_H6_S10_L002_R2_001.fastq.gz
2020-11-24 18:53:51,330 - INFO - [quant] will process sample 3: /staging/leuven/stg_00072/Ben/antNPCs/10xGenomics_F1Tm1/WT_day3/fastqFiles/200923.NovaSeq1.FCA/GC1002925_H6_S10_L001_R1_001.fastq.gz
2020-11-24 18:53:51,330 - INFO -                                /staging/leuven/stg_00072/Ben/antNPCs/10xGenomics_F1Tm1/WT_day3/fastqFiles/200923.NovaSeq1.FCA/GC1002925_H6_S10_L001_R2_001.fastq.gz
2020-11-24 18:53:51,330 - INFO - [quant] will process sample 4: /staging/leuven/stg_00072/Ben/antNPCs/10xGenomics_F1Tm1/WT_day3/fastqFiles/200923.NovaSeq1.FCA/GC1002925_H6_S10_L002_R1_001.fastq.gz
2020-11-24 18:53:51,330 - INFO -                                /staging/leuven/stg_00072/Ben/antNPCs/10xGenomics_F1Tm1/WT_day3/fastqFiles/200923.NovaSeq1.FCA/GC1002925_H6_S10_L002_R2_001.fastq.gz
2020-11-24 19:04:53,543 - INFO - [quant] finding pseudoalignments for the reads ... done
2020-11-24 19:04:53,543 - INFO - [quant] processed 364,969,769 reads, 259,218,367 reads pseudoaligned
2020-11-24 19:04:58,344 - INFO - Loading gene metadata
2020-11-24 19:04:58,691 - INFO - Loading fragments-to-gene mappings
2020-11-24 19:04:58,947 - INFO - Indexing genes
Traceback (most recent call last):
  File "/data/leuven/328/vsc32808/miniconda/envs/loompy/bin/loompy", line 10, in <module>
    sys.exit(cli())
  File "/data/leuven/328/vsc32808/miniconda/envs/loompy/lib/python3.8/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/data/leuven/328/vsc32808/miniconda/envs/loompy/lib/python3.8/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/data/leuven/328/vsc32808/miniconda/envs/loompy/lib/python3.8/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/data/leuven/328/vsc32808/miniconda/envs/loompy/lib/python3.8/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/data/leuven/328/vsc32808/miniconda/envs/loompy/lib/python3.8/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/data/leuven/328/vsc32808/miniconda/envs/loompy/lib/python3.8/site-packages/loompy/commands.py", line 34, in fromfq
    create_from_fastq(loomfile, sampleid, list(fastqs), indexdir, metadatafile, threads)
  File "/data/leuven/328/vsc32808/miniconda/envs/loompy/lib/python3.8/site-packages/loompy/bus_file.py", line 429, in create_from_fastq
    bus = BusFile(
  File "/data/leuven/328/vsc32808/miniconda/envs/loompy/lib/python3.8/site-packages/loompy/bus_file.py", line 186, in __init__
    self.gene_for_fragment_idx[i] = accession_idx[self.gene_for_fragment[i]]
KeyError: 'ENSMUSG00000079800.2::GL456210.1:9123-58882'

I think it has to do with the naming of the genes/fragments, but I do not understand enough of the code to be able to fix it.

best, Ben

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:5

github_iconTop GitHub Comments

2reactions
JBaiwircommented, Jan 15, 2021

Hello.

I found what causes the issue. I followed the pipeline here: https://github.com/linnarsson-lab/loompy/tree/master/kallisto And I faced the same type of error.

After some research, I found that an update in bedtools change its behavior. I fixed it based on this: https://github.com/arq5x/bedtools2/issues/805

If you followed the same pipeline (which is really great by the way), I suggest you to change the line 23 of the “mouse_download.sh” code as follow:

bedtools getfasta -name -fo gencode.vM23.unspliced.fa -fi GRCm38.primary_assembly.genome.fa -bed gencode.vM23.primary_assembly.annotation.sorted.bed

Becomes

bedtools getfasta -name -fi GRCm38.primary_assembly.genome.fa -bed gencode.vM23.primary_assembly.annotation.sorted.bed | sed 's/::.*//' > gencode.vM23.unspliced.fa

I just did this and rerun the pipeline and everything did as expected, and I could have a .loom file at the end.

I hope this helps. This is my first message on github, sorry if I did not followed some guidelines that I am unaware of. I was stuck with this same problem and I would be happy if someone already posted this answer.

I wish you all the best in your analyses.

Jérôme

0reactions
BenvanderVeercommented, Jan 15, 2021

Changing the line works for me indeed! Thank you for your help!

best, Ben

Read more comments on GitHub >

github_iconTop Results From Across the Web

Nomenclature of humanized mAbs: Early concepts, current ...
To specify a humanized mAb, different naming nomenclatures and guidelines were defined by the World Health Organization (WHO) or the American ...
Read more >
Debug your fragments | Android Developers
This guide covers tools that you can use to debug your fragments. FragmentManager logging. FragmentManager is capable of emitting various ...
Read more >
Pseudoalignment of Data Using Kallisto — RNA-Seq analysis ...
We will be using the mouse reference transcriptome (available on the linked ... kallisto index : Command to begin the indexing; –index= :...
Read more >
Rsubread/Subread Users Guide - Bioconductor
R and issue the following command to install Rsubread: ... 5.7GB of memory to build a gapped index for human/mouse genome. subread-align ( ......
Read more >
Manual - Bowtie 2
Building Bowtie 2 from source requires a GNU-like environment with Clang/GCC, ... The basename is the name of any of the index files...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found