question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

parse_sam_aux_fields conflict ValueError

See original GitHub issue

Dear developers! I am very curious to use DeepVariant on our in house data. In trying to do so, I stumbled upon an error I cannot seem to circumvent.

Problem: I am trying to run my bamfile that originated from a pacbio LAA output, mapped with minimap2. I receive the error that it’s unable to read any records. As I got the warGning (lol!) that --‘add_hp_channel’ is set but not ‘parse_sam_aux_fields’.

Initial command: sudo docker run -v “2021-05-11_deepvariant_PB”:“/input/” -v “2021-05-11_deepvariant_PB/output_DV”:“/output/” google/deepvariant:“1.1.0” /opt/deepvariant/bin/run_deepvariant --model_type=PACBIO --ref=/input/ref.fasta --reads=/input/R9_Z-1707-003_cluster1_RC492.bam --output_vcf=/output/output.vcf.gz

What I tried: I tried to rerun with the following extra argument: --make_examples_extra_args=“parse_sam_aux_fields=true”. This gives me the ValueError from run_deepvariant.py that it is in conflict with the sort_by_haplotypes flag, eventhough I didn’t use it. Then, I tried to add both arguments: --make_examples_extra_args=“sort_by_haplotypes=false,parse_sam_aux_fields=true”, but this gives the same ValueError. ValueError: The extra_args "parse_sam_aux_fields" conflicts with other flags. Please fix and try again. Starting in v1.1.0, if you are running with PACBIO and want to use HP tags, please use the new --use_hp_information flag instead of using --make_examples_extra_args="sort_by_haplotypes=true,parse_sam_aux_fields=true"

I also tried to run the command with --sample_name=Z-1707-003_cluster1_RC492_phase0 (the RG for the bamfile), which does not give the warning anymore, but still leaves me with an empty vcf.

Tool stderr for the initial command:

I0511 12:24:29.658635 140614860437248 run_deepvariant.py:317] Re-using the directory for intermediate results in /tmp/tmpq5tvks3j

***** Intermediate results will be written to /tmp/tmpq5tvks3j in docker. ****


***** Running the command:*****
( time seq 0 0 | parallel -q --halt 2 --line-buffer /opt/deepvariant/bin/make_examples --mode calling --ref "/input/ref.fasta" --reads "/input/R9_Z-1707-003_cluster1_RC492.bam" --examples "/tmp/tmpq5tvks3j/make_examples.tfrecord@1.gz" --add_hp_channel --alt_aligned_pileup "diff_channels" --noparse_sam_aux_fields --norealign_reads --nosort_by_haplotypes --vsc_min_fraction_indels "0.12" --task {} )

I0511 12:24:31.945842 140409179444992 genomics_reader.py:223] Reading /input/R9_Z-1707-003_cluster1_RC492.bam with NativeSamReader
W0511 12:24:31.946794 140409179444992 make_examples.py:589] WARGNING! --add_hp_channel is set but --parse_sam_aux_fields is not set. This will cause aux fields to not be read in. The relevant values might be zero. For example, for --add_hp_channel, resulting in an empty
HP channel. If this is not what you intended, please stop and enable --parse_sam_aux_fields.
I0511 12:24:32.430390 140409179444992 make_examples.py:648] Preparing inputs
I0511 12:24:32.438421 140409179444992 genomics_reader.py:223] Reading /input/R9_Z-1707-003_cluster1_RC492.bam with NativeSamReader
I0511 12:24:32.440476 140409179444992 make_examples.py:648] Common contigs are ['T86']
I0511 12:24:32.442919 140409179444992 make_examples.py:648] Starting from v0.9.0, --use_ref_for_cram is default to true. If you are using CRAM input, note that we will decode CRAM using the reference you passed in with --ref
2021-05-11 12:24:32.443393: I third_party/nucleus/io/sam_reader.cc:662] Setting HTS_OPT_BLOCK_SIZE to 134217728
I0511 12:24:32.447968 140409179444992 genomics_reader.py:223] Reading /input/R9_Z-1707-003_cluster1_RC492.bam with NativeSamReader
I0511 12:24:32.453339 140409179444992 genomics_reader.py:223] Reading /input/R9_Z-1707-003_cluster1_RC492.bam with NativeSamReader
I0511 12:24:32.579413 140409179444992 make_examples.py:648] Writing examples to /tmp/tmpq5tvks3j/make_examples.tfrecord-00000-of-00001.gz
I0511 12:24:32.579596 140409179444992 make_examples.py:648] Overhead for preparing inputs: 0 seconds
I0511 12:24:32.587054 140409179444992 make_examples.py:648] 0 candidates (0 examples) [0.01s elapsed]
I0511 12:24:32.591045 140409179444992 make_examples.py:648] Found 0 candidate variants
I0511 12:24:32.591111 140409179444992 make_examples.py:648] Created 0 examples

real    0m3.165s
user    0m3.133s
sys     0m1.450s

***** Running the command:*****
( time /opt/deepvariant/bin/call_variants --outfile "/tmp/tmpq5tvks3j/call_variants_output.tfrecord.gz" --examples "/tmp/tmpq5tvks3j/make_examples.tfrecord@1.gz" --checkpoint "/opt/models/pacbio/model.ckpt" )

W0511 12:24:34.935784 140411820246784 call_variants.py:327] Unable to read any records from /tmp/tmpq5tvks3j/make_examples.tfrecord@1.gz. Output will contain zero records.

real    0m2.355s
user    0m2.789s
sys     0m1.594s

***** Running the command:*****
( time /opt/deepvariant/bin/postprocess_variants --ref "/input/ref.fasta" --infile "/tmp/tmpq5tvks3j/call_variants_output.tfrecord.gz" --outfile "/output/output.vcf.gz" )

I0511 12:24:37.234371 139970945300224 postprocess_variants.py:1083] Could not determine sample name and --sample_name is unset. Using the default sample name. Sample name: default
I0511 12:24:37.235468 139970945300224 postprocess_variants.py:1111] call_variants_output is empty. Writing out empty VCF.
I0511 12:24:37.235656 139970945300224 postprocess_variants.py:1139] Writing variants to VCF.
I0511 12:24:37.235709 139970945300224 postprocess_variants.py:723] Writing output to VCF file: /output/output.vcf.gz
I0511 12:24:37.236480 139970945300224 genomics_writer.py:176] Writing /output/output.vcf.gz with NativeVcfWriter
I0511 12:24:37.237797 139970945300224 postprocess_variants.py:1147] VCF creation took 3.563165664672851e-05 minutes
I0511 12:24:37.239083 139970945300224 genomics_reader.py:223] Reading /output/output.vcf.gz with NativeVcfReader

real    0m2.472s
user    0m2.962s
sys     0m1.380

Thanks a lot!!

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:11

github_iconTop GitHub Comments

1reaction
MariaNattestadcommented, May 13, 2021

Happy to help! For your question, it depends on how low the coverage is. You can see this blog post for how coverage impacts accuracy: https://google.github.io/deepvariant/posts/2019-09-10-twenty-is-the-new-thirty-comparing-current-and-historical-wgs-accuracy-across-coverage/

0reactions
pichuancommented, May 20, 2021

Hi @annabeldekker

I’ll paste some similar information from my answer in the other issue: https://github.com/google/deepvariant/issues/458#issuecomment-844317545. Hopefully my answer below will help you as well:

Starting from v1.1.0, we added an additional channel to our PacBio model, and tried to simplify the flags in the one-step run_deepvariant by adding just one flag --use_hp_information, which you can set to false if you’re BAM is not phased, and set to true if your BAM is phased.

Example: https://github.com/google/deepvariant/blob/r1.1/docs/deepvariant-pacbio-model-case-study.md#run-deepvariant-on-haplotagged-chromosome-20-alignments

This --use_hp_information flag in the one-step run_deepvariant command actually controls both sort_by_haplotypes and parse_sam_aux_fields in the make_examples stage. If you set --use_hp_information to true in the one-step run_deepvariant command, that means sort_by_haplotypes and parse_sam_aux_fields are both set to true in make_examples stage. And if you set --use_hp_information to false, that means sort_by_haplotypes and parse_sam_aux_fields are both set to false in make_examples stage.

In both cases, if you’re running for PacBio, you always have to set --add_hp_channel to true in make_examples stage make sure the last channel is added. (If you’re using the one-step run_deepvariant command, --add_hp_channel is automatically added).

We tried our best to encaspulate these 3 flags into just one --use_hp_information in our one-step run_deepvariant command. However, I understand this might have caused further confusion when people tried to use the make_examples binary on its own. You can find the logic here: https://github.com/google/deepvariant/blob/r1.1/scripts/run_deepvariant.py#L240-L242

I will try to update our deepvariant-pacbio-model-case-study.md file to document this.

Read more comments on GitHub >

github_iconTop Results From Across the Web

No results found

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found