question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Strange false positive call

See original GitHub issue

Hello, (me again sorry), I am eyeballing bam files for evaluation of candidates. Mostly it’s fine (I open a lot of issues but I want to stress that most reported variants seem correct). But here is a very, very strange case to my eyes.

Here is the gvcf line

chromosome_1	17434065	.	C	T,<*>	29.1	PASS	.	GT:GQ:DP:AD:VAF:PL	0/1:29:64:38,26,0:0.40625,0:29,0,99,990,990,990

That looks all right to me, seems like a “solid” candidate. But now, let’s look at the bam

CT the relevant position is the first C starting from the right. As you see, there is not a single T base there. The site seems perfectly homozygous for C:C.

However, a bit on the left, you can see that many of the mapped reads abruptly end at the same position with a T. It’s not a variant but something a bit strange seems to happen in that region.

So far it’s the only such case I have. Do you have any idea of what’s going on?

Here are the lines from the gVCF before the site. The abrupt T position is at 17434056, so in a block with no candidate. But just right after, for a few bases, DeepVariant threw a few calls that are completely not supported by the mapping.

zgrep -w -C8 "17434065" H4A4.g.vcf.gz
chromosome_1	17434049	.	T	G,<*>	29.2	PASS	.	GT:GQ:DP:AD:VAF:PL	0/1:29:62:36,25,0:0.403226,0:29,0,99,990,990,990
chromosome_1	17434050	.	G	<*>	0	.	END=17434056	GT:GQ:MIN_DP:PL	0/0:50:59:0,186,1859
chromosome_1	17434057	.	G	T,<*>	25.7	PASS	.	GT:GQ:DP:AD:VAF:PL	0/1:26:59:33,26,0:0.440678,0:25,0,71,990,990,990
chromosome_1	17434058	.	T	<*>	0	.	END=17434058	GT:GQ:MIN_DP:PL	0/0:50:63:0,189,1889
chromosome_1	17434059	.	G	T,<*>	25	PASS	.	GT:GQ:DP:AD:VAF:PL	0/1:25:63:37,26,0:0.412698,0:25,0,77,990,990,990
chromosome_1	17434060	.	A	<*>	0	.	END=17434062	GT:GQ:MIN_DP:PL	0/0:50:64:0,165,1889
chromosome_1	17434063	.	G	C,<*>	26.2	PASS	.	GT:GQ:DP:AD:VAF:PL	0/1:26:64:38,25,0:0.390625,0:26,0,99,990,990,990
chromosome_1	17434064	.	A	<*>	0	.	END=17434064	GT:GQ:MIN_DP:PL	0/0:50:64:0,192,1919
chromosome_1	17434065	.	C	T,<*>	29.1	PASS	.	GT:GQ:DP:AD:VAF:PL	0/1:29:64:38,26,0:0.40625,0:29,0,99,990,990,990

any idea of what’s going on there? It’s a bit annoying in the sense that I don’t know how I could have caught that without eyeballing the alignment. In my experiment I plan to eyeball all my candidates anyway (because there are less than 100 of them).

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:9 (1 by maintainers)

github_iconTop GitHub Comments

1reaction
ghostcommented, Mar 13, 2020

@aderzelle DeepVariant performs local realignment by default (can be disabled with --norealign_reads for the make_examples step). Realignment may cause this region to look a bit different than what you see in the original BAM, explaining the candidates that are generated.

By default, the realigned reads do not get written out, but you can use the flags below to output the realigned reads to a file called realigned_reads.bam. These flags can be added to the make_examples step so that you can inspect the realigned data.

--emit_realigned_reads - enables writing out of realigned reads --realigner_diagnostics=/path/to/dir - path to which the reads will be written

Hello, how are we supposed to pass those arguments with the one-step wrapper script?

I tried

docker run -v "${INPUT_DIR}":"/input" -v "${OUTPUT_DIR}:/output" google/deepvariant:"${BIN_VERSION}" /opt/deepvariant/bin/run_deepvariant --model_type=WGS --ref=/input/shasta_final.fa --reads="/input/${SAMPLE}.sorted.bam" --regions="/input/ARCcestor.bed" --output_vcf=/output/${SAMPLE}.vcf.gz --output_gvcf=/output/${SAMPLE}.g.vcf.gz --num_shards="${N_SHARDS}" --customized_model="/input/mosquito_model/model.ckpt-97700" --make_examples_extra_args=emit_realigned_reads,realigner_diagnostics=/media/urbe/MyADrive/12-03-2020_DeepVariant_NDPD/bam_diagnostics

but runs into

Traceback (most recent call last):                                                                                                                                                                          
  File "/opt/deepvariant/bin/run_deepvariant.py", line 317, in <module>                                                                                                                                     
    app.run(main)                                                                                                                                                                                           
  File "/usr/local/lib/python2.7/dist-packages/absl/app.py", line 300, in run                                                                                                                               
    _run_main(main, args)                                                                                                                                                                                   
  File "/usr/local/lib/python2.7/dist-packages/absl/app.py", line 251, in _run_main                                                                                                                         
    sys.exit(main(argv))                                                                                                                                                                                    
  File "/opt/deepvariant/bin/run_deepvariant.py", line 304, in main                                                                                                                                         
    commands = create_all_commands()                                                                                                                                                                        
  File "/opt/deepvariant/bin/run_deepvariant.py", line 276, in create_all_commands                                                                                                                          
    sample_name=FLAGS.sample_name))                                                                                                                                                                         
  File "/opt/deepvariant/bin/run_deepvariant.py", line 182, in make_examples_command                                                                                                                        
    kwargs.update(_extra_args_to_dict(extra_args))                                                                                                                                                          
  File "/opt/deepvariant/bin/run_deepvariant.py", line 135, in _extra_args_to_dict                                                                                                                          
    (flag_name, flag_value) = extra_arg.split('=')                                                                                                                                                          
ValueError: need more than 1 value to unpack     

EDIT: ah ok, the flag must explicitly be set to “=true” ^^’

0reactions
MariaNattestadcommented, Mar 6, 2020

Hi @aderzelle If the question is why a variant is getting called when you see no sign of it in the original bam, then that can only be answered by inspecting the realigned bam, not by additional sequencing. It seems likely given the split reads that you are seeing the signal of some kind of misassembly or structural variant (if the genomes producing the bam and the reference are not the same). If seeing the realigned bam does not answer your question, then you can send the files to marianattestad@google.com and I can take a look. I would need to see the original bam, the realigned bam, the reference genome, and the position of the variant you are asking about.

Read more comments on GitHub >

github_iconTop Results From Across the Web

False-Positive Results in Rapid Antigen Tests for SARS-CoV-2
This study examines the incidence of false-positive results in a sample of rapid antigen tests used to serially screen asymptomatic workers ...
Read more >
Practices of Science: False Positives and False Negatives
A false positive is a “false alarm.” A false negative is saying something is false when it is actually true (also called a...
Read more >
What Is a False Positive COVID-19 Test? - Prevention.com
False positive COVID-19 tests occur when you don't have the novel coronavirus, but the test is positive. Experts explain how and why false...
Read more >
False positive rapid COVID-19 test: Causes and accuracy
A false positive result is possible with a rapid COVID-19 test. It happens when a person does not have COVID-19 but still tests...
Read more >
Why False Positives Merit Concern, Too - The New York Times
Even laboratory tests that rely on a very reliable technique called polymerase chain reaction, or P.C.R., have been known to return the ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found