question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Varying the channels used to call variants

See original GitHub issue

Describe the issue: I previously used the following PopVCF model.ckpt with run_deepvariant v.1.1 while including a PopVCF channel during make_examples. However, that model does not include a channel for insert_size as their work predates v1.4.

With the default extra channel for 'insert_size' in v1.4, and make_examples having numerous options to include additional channels:

--[no]use_allele_frequency: If True, add another channel for pileup images to represent allele frequency information gathered from population call sets.
    (default: 'false')
--[no]add_hp_channel: If true, add another channel to represent HP tags per read.
    (default: 'false')
--channels: Comma-delimited list of optional channels to add. Available Channels: read_mapping_percent,avg_base_quality,identity,gap_compressed_identity,gc_content,is_homopolymer,homopolymer_weighted,blank,insert_size

Are there model-ckpt files for these channel options available somewhere to provide call_variants via:

--checkpoint: Required. Path to the TensorFlow model checkpoint to use to evaluate candidate variant calls.

If so, do they include one additional channel or permutations of multiple channels?

If not, is there an alternative way to have run_deepvariant use different channels than what the default checkpoint contains during call_variants? For example, I am currently unable to include both insert_size and allele_frequency with v1.4

Setup

  • Operating system:
  • DeepVariant version: v1.4
  • Installation method (Docker, built from source, etc.): Singularity
  • Type of data: WGS

Steps to reproduce:

  • Command:
time singularity run -B '/usr/lib/locale/:/usr/lib/locale/,/path/to/region_files/:/region_dir/,/path/to/container/deep-variant/:/run_dir/,/path/to/output/:/path/to/reference_genome/:/ref_dir/,/path/to/bam_files/:/bam_dir/,/path/to/population_vcf/:/popVCF_dir/' 

  deepvariant_1.4.0.sif   
      /opt/deepvariant/bin/run_deepvariant 
      --model_type=WGS
      --ref='/ref_dir/reference.fa' 
      --reads='/bam_dir/id.bam' 
      --output_vcf='/out_dir/test1.vcf.gz' 
      --intermediate_results_dir='/out_dir/tmp/test1/' 
      --num_shards='39' 
      --make_examples_extra_args="use_allele_frequency=true,population_vcfs=/popVCF_dir/UMAG1.POP.FREQ.vcf.gz" 
      --regions=/region_dir/regions_to_test.bed 
  • Error trace: (if applicable)
***** Running the command:*****
time /opt/deepvariant/bin/call_variants --outfile "/out_dir/tmp/test1/call_variants_output.tfrecord.gz" --examples "/out_dir/tmp/test1/make_examples.tfrecord@39.gz" --checkpoint "/opt/models/wgs/model.ckpt" --openvino_model_dir "/out_dir/tmp/test1/"

I0919 17:19:47.185331 46912500266816 call_variants.py:317] From /out_dir/tmp/test1/make_examples.tfrecord-00000-of-00039.gz.example_info.json: Shape of input examples: [100, 221, 8], Channels of input examples: [1, 2, 3, 4, 5, 6, 8, 19].
Traceback (most recent call last):
  File "/tmp/Bazel.runfiles_l3__pco1/runfiles/com_google_deepvariant/deepvariant/call_variants.py", line 513, in <module>
    tf.compat.v1.app.run()
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/platform/app.py", line 40, in run
    _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
  File "/tmp/Bazel.runfiles_l3__pco1/runfiles/absl_py/absl/app.py", line 300, in run
    _run_main(main, args)
  File "/tmp/Bazel.runfiles_l3__pco1/runfiles/absl_py/absl/app.py", line 251, in _run_main
    sys.exit(main(argv))
  File "/tmp/Bazel.runfiles_l3__pco1/runfiles/com_google_deepvariant/deepvariant/call_variants.py", line 494, in main
    call_variants(
  File "/tmp/Bazel.runfiles_l3__pco1/runfiles/com_google_deepvariant/deepvariant/call_variants.py", line 363, in call_variants
    raise ValueError('The number of channels in examples and checkpoint '
ValueError: The number of channels in examples and checkpoint should match, but the checkpoint has 7 channels while the examples have 8.

real    0m3.217s
user    0m4.066s
sys     0m4.174s

real    77m45.059s
user    2960m49.979s
sys     39m40.911s```


Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:6

github_iconTop GitHub Comments

1reaction
pichuancommented, Sep 24, 2022

Hi @jkalleberg , please see See: https://gist.github.com/pichuan/7ad09bf1fa8f519facf6806eca835ea6

I’ll close this issue for now. Feel free to open more issues if you have any questions or feedback for us.

1reaction
pichuancommented, Sep 21, 2022

New model checkpoints associated with new releases will be under gs://deepvariant/models/DeepVariant as you noticed.

I mentioned that starting from v1.4.0, you can see this file:

$ gsutil cat gs://deepvariant/models/DeepVariant/1.4.0/DeepVariant-inception_v3-1.4.0+data-wgs_standard/model.ckpt.example_info.json
{"version": "1.4.0", "shape": [100, 221, 7], "channels": [1, 2, 3, 4, 5, 6, 19]}  

The “channels” values are enums. You can look them up in this proto: https://github.com/google/deepvariant/blob/r1.4/deepvariant/protos/deepvariant.proto#L1048

From the example above, it’s saying that DeepVariant v1.4.0 WGS model has 7 channels, and they are:

  CH_READ_BASE = 1;
  CH_BASE_QUALITY = 2;
  CH_MAPPING_QUALITY = 3;
  CH_STRAND = 4;
  CH_READ_SUPPORTS_VARIANT = 5;
  CH_BASE_DIFFERS_FROM_REF = 6;
  CH_INSERT_SIZE = 19;

Note that the allele frequency model isn’t part of our regular release process yet. It’s made public as part of our preprint https://doi.org/10.1101/2021.01.06.425550. Right now, we’re retraining it when users request it. We’re certainly hoping to see more uses cases (thank you for letting us know!). If it’s become more mature, we can consider building it into part of our regular release process. (Adding more regular supports also means more overhead for each release, so we need to balance this carefully.)

Read more comments on GitHub >

github_iconTop Results From Across the Web

Best practices for evaluating single nucleotide variant calling ...
In order to draw meaningful conclusions from evaluation of variant calling methods, the process used to identify the variants must first be understood....
Read more >
Adding Custom Channels to DeepVariant - Google
This tutorial will show you how to add custom channels to DeepVariant, train new models, and use them to perform variant calling.
Read more >
Calling variants in non-diploid systems - Galaxy Training!
Using Galaxy's main site we will see how to call variants in bacteria, viruses, ... FreeBayes is widely used for calling variants in...
Read more >
Variant Calling Pipeline using GATK4
This pipeline is intended for calling variants in samples that are clonal – i.e. a single individual. The frequencies of variants in these ......
Read more >
Best practices for variant calling in clinical sequencing
In this review, I discuss the current best practices for variant calling in clinical sequencing studies, with a particular emphasis on trio ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found