question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Having issues running on Google Cloud Genomics

See original GitHub issue

I’m attempting to launch DeepVariant on GCP, with the VerilyGRCH38 ref genome and the whole exome model. Unfortunately the job never finishes. Any pointers on how to debug this would be much appreciated.

#!/bin/bash
set -euo pipefail
# Set common settings.
PROJECT_ID=valis-194104
OUTPUT_BUCKET=gs://canis/CNR-data
STAGING_FOLDER_NAME=deep_variant_files
OUTPUT_FILE_NAME=TLE_a_001_deep_variant.vcf
# Model for calling whole exome sequencing data.
MODEL=gs://deepvariant/models/DeepVariant/0.7.0/DeepVariant-inception_v3-0.7.0+data-wes_standard
IMAGE_VERSION=0.7.0
DOCKER_IMAGE=gcr.io/deepvariant-docker/deepvariant:"${IMAGE_VERSION}"
COMMAND="/opt/deepvariant_runner/bin/gcp_deepvariant_runner \
  --project ${PROJECT_ID} \
  --zones us-west1-b \
  --docker_image ${DOCKER_IMAGE} \
  --outfile ${OUTPUT_BUCKET}/${OUTPUT_FILE_NAME} \
  --staging ${OUTPUT_BUCKET}/${STAGING_FOLDER_NAME} \
  --model ${MODEL} \
  --regions gs://canis/CNR-data/CDS-canonical.bed \
  --bam gs://canis/CNR-data/TLE_a_001.bam \
  --bai gs://canis/CNR-data/TLE_a_001.bam.bai \
  --ref gs://genomics-public-data/references/GRCh38_Verily/GRCh38_Verily_v1.genome.fa  \
  --gcsfuse"
# Run the pipeline.
gcloud alpha genomics pipelines run \
    --project "${PROJECT_ID}" \
    --service-account-scopes="https://www.googleapis.com/auth/cloud-platform" \
    --logging "${OUTPUT_BUCKET}/${STAGING_FOLDER_NAME}/runner_logs_$(date +%Y%m%d_%H%M%S).log" \
    --zones us-west1-b \
    --docker-image gcr.io/deepvariant-docker/deepvariant_runner:"${IMAGE_VERSION}" \
    --command-line "${COMMAND}"

I get the following error:

      Traceback (most recent call last):
        File "/opt/deepvariant_runner/src/gcp_deepvariant_runner.py", line 862, in <module>
          run()
        File "/opt/deepvariant_runner/src/gcp_deepvariant_runner.py", line 845, in run
          _run_make_examples(pipeline_args)
        File "/opt/deepvariant_runner/src/gcp_deepvariant_runner.py", line 340, in _run_make_examples
          _wait_for_results(threads, results)
        File "/opt/deepvariant_runner/src/gcp_deepvariant_runner.py", line 352, in _wait_for_results
          result.get()
        File "/usr/lib/python2.7/multiprocessing/pool.py", line 572, in get
          raise self._value
      RuntimeError: Job failed with error "run": operation "projects/valis-194104/operations/13939489157244551677" failed: executing pipeline: Execution failed: action 5: unexpected exit status 1 was not ignored (reason: FAILED_PRECONDITION)
    details:

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:31 (6 by maintainers)

github_iconTop GitHub Comments

2reactions
pichuancommented, Nov 8, 2018

@nmousavi from the Google Cloud team to take a look. I’ll also take a look later today when I find some more time.

1reaction
nmousavicommented, Nov 12, 2018

Lets first see if that’s the case (i.e. having bed file in public bucket resolves it). If yes, this is a bug and we will fix it.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Troubleshooting | Cloud Life Sciences Documentation
Missing the Cloud Life Sciences service account or Cloud Life Sciences Service Agent role ... The Cloud Life Sciences Service Agent service account...
Read more >
Genomic Data Processing on Google Cloud Platform
As one of the largest genome sequencing centers in the world, the Broad Institute of MIT and Harvard generates a lot of data....
Read more >
Genomic Analyses on Google Cloud Platform (Cloud Next '19)
Using Google Cloud Platform and other open-source tools such as GATK Best Practices and DeepVariant, learn how to perform end-to-end ...
Read more >
Broad References – Marketplace - Google Cloud Console
Set of human genomics reference files used for DNA/RNA sequencing analytics. ... Tips for solving common problems and errors that users often encounter ......
Read more >
Count Reads — Google Genomics v1 documentation
The pipeline is implemented on Google Cloud Dataflow. ... The above command line runs the pipeline locally over a small portion of the...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found