Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Network parameter not being read

See original GitHub issue

I’m using this script:

#!/bin/bash
# Parameters to replace:
# The GOOGLE_CLOUD_PROJECT is the project that contains your BigQuery dataset.
GOOGLE_CLOUD_PROJECT=psjh-eacri-data
INPUT_PATTERN=https://storage.googleapis.com/gcp-public-data--gnomad/release/2.1.1/vcf/exomes/gnomad.exomes.r2.1.1.sites.*.vcf.bgz
# INPUT_PATTERN=gs://gcp-public-data--gnomad/release/2.1.1/vcf/exomes/*.vcf.bgz
OUTPUT_TABLE=eacri-genomics:gnomad.gnomad_hg19_2_1_1
TEMP_LOCATION=gs://psjh-eacri/balter/gnomad_tmp/vcf/exomes/*.vcf.bgz/tmp

COMMAND="vcf_to_bq \
    --input_pattern ${INPUT_PATTERN} \
    --output_table ${OUTPUT_TABLE} \
    --temp_location ${TEMP_LOCATION} \
    --job_name vcf-to-bigquery \
    --runner DataflowRunner \
    --zones us-east1-b \
    --network projects/phs-205720/global/networks/psjh-shared01 \
    --subnet projects/phs-205720/regions/us-east1/subnetworks/subnet01"
    
docker run -v ~/.config:/root/.config \
    gcr.io/cloud-lifesciences/gcp-variant-transforms \
    --project "${GOOGLE_CLOUD_PROJECT}" \
    --temp_location ${TEMP_LOCATION} \
    "${COMMAND}"

And, yet, the error says that the network was not specified, and the network slot is empty in the JSON output.

What change do I need to make to my script? Or, is some other format needed to specify the network?

The script template doesn’t include a network or subnet parameter at all.

base) jupyter@balter-genomics:~$ ./script.sh
 --project 'psjh-eacri-data' --temp_location 'gs://psjh-eacri/balter/gnomad_tmp/vcf/exomes/*.vcf.bgz/tmp' -- 'vcf_to_bq     --input_pattern gs://gcp-public-data--gnomad/release/2.1.1/vcf/exomes/*.vcf.bgz     --output_table eacri-genomics:gnomad.gnomad_hg19_2_1_1     --temp_location gs://psjh-eacri/balter/gnomad_tmp/vcf/exomes/*.vcf.bgz/tmp     --job_name vcf-to-bigquery     --runner DataflowRunner     --zones us-east1-b     --subnet subnet03'
Your active configuration is: [variant]
{
  "pipeline": {
    "actions": [
      {
        "commands": [
          "-c",
          "mkdir -p /mnt/google/.google/tmp"
        ],
        "entrypoint": "bash",
        "imageUri": "gcr.io/cloud-genomics-pipelines/io",
        "mounts": [
          {
            "disk": "google",
            "path": "/mnt/google"
          }
        ]
      },
      {
        "commands": [
          "-c",
          "/opt/gcp_variant_transforms/bin/vcf_to_bq --input_pattern gs://gcp-public-data--gnomad/release/2.1.1/vcf/exomes/*.vcf.bgz --output_table eacri-genomics:gnomad.gnomad_hg19_2_1_1 --temp_location gs://psjh-eacri/balter/gnomad_tmp/vcf/exomes/*.vcf.bgz/tmp --job_name vcf-to-bigquery --runner DataflowRunner --zones us-east1-b --subnet subnet03 --project psjh-eacri-data --region us-east1 --temp_location gs://psjh-eacri/balter/gnomad_tmp/vcf/exomes/*.vcf.bgz/tmp"
        ],
        "entrypoint": "bash",
        "imageUri": "gcr.io/cloud-lifesciences/gcp-variant-transforms",
        "mounts": [
          {
            "disk": "google",
            "path": "/mnt/google"
          }
        ]
      },
      {
        "alwaysRun": true,
        "commands": [
          "-c",
          "gsutil -q cp /google/logs/output gs://psjh-eacri/balter/gnomad_tmp/vcf/exomes/*.vcf.bgz/tmp/runner_logs_20210510_230717.log"
        ],
        "entrypoint": "bash",
        "imageUri": "gcr.io/cloud-genomics-pipelines/io",
        "mounts": [
          {
            "disk": "google",
            "path": "/mnt/google"
          }
        ]
      }
    ],
    "environment": {
      "TMPDIR": "/mnt/google/.google/tmp"
    },
    "resources": {
      "regions": [
        "us-east1"
      ],
      "virtualMachine": {
        "disks": [
          {
            "name": "google",
            "sizeGb": 10
          }
        ],
        "machineType": "g1-small",
        "network": {},
        "serviceAccount": {
          "scopes": [
            "https://www.googleapis.com/auth/cloud-platform",
            "https://www.googleapis.com/auth/devstorage.read_write"
          ]
        }
      }
    }
  }
}
Pipeline running as "projects/447346450878/locations/us-central1/operations/13027962545459232820" (attempt: 1, preemptible: false)
Output will be written to "gs://psjh-eacri/balter/gnomad_tmp/vcf/exomes/*.vcf.bgz/tmp/runner_logs_20210510_230717.log"
23:07:26 Worker "google-pipelines-worker-ab367d994b1cd7881ebf66950fec6c17" assigned in "us-east1-b" on a "g1-small" machine
23:07:26 Execution failed: allocating: creating instance: inserting instance: Invalid value for field 'resource.networkInterfaces[0].network': ''. The referenced network resource cannot be found.
23:07:27 Worker released
"run": operation "projects/447346450878/locations/us-central1/operations/13027962545459232820" failed: executing pipeline: Execution failed: allocating: creating instance: inserting instance: Invalid value for field 'resource.networkInterfaces[0].network': ''. The referenced network resource cannot be found. (reason: INVALID_ARGUMENT)
(base) jupyter@balter-genomics:~$ ./script.sh
 --project 'psjh-eacri-data' --temp_location 'gs://psjh-eacri/balter/gnomad_tmp/vcf/exomes/*.vcf.bgz/tmp' -- 'vcf_to_bq     --input_pattern https://storage.googleapis.com/gcp-public-data--gnomad/release/2.1.1/vcf/exomes/gnomad.exomes.r2.1.1.sites.*.vcf.bgz     --output_table eacri-genomics:gnomad.gnomad_hg19_2_1_1     --temp_location gs://psjh-eacri/balter/gnomad_tmp/vcf/exomes/*.vcf.bgz/tmp     --job_name vcf-to-bigquery     --runner DataflowRunner     --zones us-east1-b     --subnet subnet03'
Your active configuration is: [variant]
{
  "pipeline": {
    "actions": [
      {
        "commands": [
          "-c",
          "mkdir -p /mnt/google/.google/tmp"
        ],
        "entrypoint": "bash",
        "imageUri": "gcr.io/cloud-genomics-pipelines/io",
        "mounts": [
          {
            "disk": "google",
            "path": "/mnt/google"
          }
        ]
      },
      {
        "commands": [
          "-c",
          "/opt/gcp_variant_transforms/bin/vcf_to_bq --input_pattern https://storage.googleapis.com/gcp-public-data--gnomad/release/2.1.1/vcf/exomes/gnomad.exomes.r2.1.1.sites.*.vcf.bgz --output_table eacri-genomics:gnomad.gnomad_hg19_2_1_1 --temp_location gs://psjh-eacri/balter/gnomad_tmp/vcf/exomes/*.vcf.bgz/tmp --job_name vcf-to-bigquery --runner DataflowRunner --zones us-east1-b --subnet subnet03 --project psjh-eacri-data --region us-east1 --temp_location gs://psjh-eacri/balter/gnomad_tmp/vcf/exomes/*.vcf.bgz/tmp"
        ],
        "entrypoint": "bash",
        "imageUri": "gcr.io/cloud-lifesciences/gcp-variant-transforms",
        "mounts": [
          {
            "disk": "google",
            "path": "/mnt/google"
          }
        ]
      },
      {
        "alwaysRun": true,
        "commands": [
          "-c",
          "gsutil -q cp /google/logs/output gs://psjh-eacri/balter/gnomad_tmp/vcf/exomes/*.vcf.bgz/tmp/runner_logs_20210511_000846.log"
        ],
        "entrypoint": "bash",
        "imageUri": "gcr.io/cloud-genomics-pipelines/io",
        "mounts": [
          {
            "disk": "google",
            "path": "/mnt/google"
          }
        ]
      }
    ],
    "environment": {
      "TMPDIR": "/mnt/google/.google/tmp"
    },
    "resources": {
      "regions": [
        "us-east1"
      ],
      "virtualMachine": {
        "disks": [
          {
            "name": "google",
            "sizeGb": 10
          }
        ],
        "machineType": "g1-small",
        "network": {},
        "serviceAccount": {
          "scopes": [
            "https://www.googleapis.com/auth/cloud-platform",
            "https://www.googleapis.com/auth/devstorage.read_write"
          ]
        }
      }
    }
  }
}
Pipeline running as "projects/447346450878/locations/us-central1/operations/3293803574088782620" (attempt: 1, preemptible: false)
Output will be written to "gs://psjh-eacri/balter/gnomad_tmp/vcf/exomes/*.vcf.bgz/tmp/runner_logs_20210511_000846.log"
00:08:56 Worker "google-pipelines-worker-e05c2864661a5ba9f1b29012de1ac56d" assigned in "us-east1-d" on a "g1-small" machine
00:08:56 Execution failed: allocating: creating instance: inserting instance: Invalid value for field 'resource.networkInterfaces[0].network': ''. The referenced network resource cannot be found.
00:08:57 Worker released
"run": operation "projects/447346450878/locations/us-central1/operations/3293803574088782620" failed: executing pipeline: Execution failed: allocating: creating instance: inserting instance: Invalid value for field 'resource.networkInterfaces[0].network': ''. The referenced network resource cannot be found. (reason: INVALID_ARGUMENT)

Issue Analytics

State:
Created 2 years ago
Comments:27 (10 by maintainers)

Top GitHub Comments

1reaction

pgrosucommented, Oct 17, 2021

@abalter I think @slagelwa is referring that in the code the command parser is missing the network parameter. If you look at this, it does not parse for the network option, which is probably why it is always empty:

https://github.com/googlegenomics/gcp-variant-transforms/blob/master/docker/pipelines_runner.sh#L25

getopt -o '' -l project:,temp_location:,docker_image:,region:,subnetwork:,use_public_ips:,service_account:,location: -- "$@"

0reactions

pgrosucommented, Oct 21, 2021

@moschetti Not all buckets are created equal 😉 Regional buckets provide huge cost-savings over multi-region ones, which is why one would prefer that the code co-locate accordingly to those sites. For example, here’s the monthly Cloud Storage cost for 100 TB calculated using the Google Cloud Pricing Calculator for a regional site (Iowa) as compared to a multi-region (whole US). The result is that an additional cost of $614/month for multi-region buckets would be necessary, which can be quite a lot for some folks that might need it for other Cloud resources during their analysis:

For the Iowa (regional) location ($2048/month)

1x Standard Storage Location: Iowa Total Amount of Storage: 102,400 GiB Egress - Data moves within the same location: 0 GiB Always Free usage included: No USD 2,048.000

For the US (multi-region) location ($2662/month)

1x Standard Storage Location: United States Total Amount of Storage: 102,400 GiB Egress - Data moves within the same location: 0 GiB Always Free usage included: No USD 2,662.400

Additional Egress Costs ($1024/month)

On top of that there could be egress charges for data moves, which can for instance add to the total cost an extra $1024 ($3,072 - $2,048), making even more the case for the free cost of the code moves:

1x Standard Storage Location: Iowa Total Amount of Storage: 102,400 GiB Egress - Data moves between different locations on the same continent: 102,400 GiB Always Free usage included: No USD 3,072.000

Hope it helps, Paul