question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

JSON ValueError when importing gnomAD variants

See original GitHub issue

I am trying to import the gnomAD variants into BigQuery, but I consistently get an error that makes the pipeline fail on chr1. I was able to successfully import some other chromosomes (chr21 and chr10).

The error I’m getting is ValueError: Out of range float values are not JSON compliant. NAN, INF and -INF values are not JSON compliant. [while running 'VariantToBigQuery/ConvertToBigQueryTableRow'].

I tried the --allow_malformed_records True flag, but that made no difference.

The input file is from https://storage.googleapis.com/gnomad-public/release/2.0.2/vcf/genomes/gnomad.genomes.r2.0.2.sites.chr1.vcf.bgz and I decompressed it before running the pipeline.

My pipeline configuration is

name: gnomad-genomes-to-bigquery-pipeline
docker:
  imageName: gcr.io/gcp-variant-transforms/gcp-variant-transforms
  cmd: |
    ./opt/gcp_variant_transforms/bin/vcf_to_bq \
      --project XXX \
      --input_pattern gs://XXX/gnomad/vcf_decompressed/genomes/gnomad.genomes.r2.0.2.sites.chr1.vcf \
      --allow_malformed_records True \
      --output_table dg-platform:GnomAD.gnomad_genomes_chr1 \
      --staging_location gs://XXX/staging \
      --temp_location gs://XXX/temp \
      --job_name gnomad-genomes-to-bigquery-pipeline-chr1 \
      --runner DataflowRunner

The full log is

2018/01/10 04:34:27 I: Switching to status: pulling-image
2018/01/10 04:34:27 I: Calling SetOperationStatus(pulling-image)
2018/01/10 04:34:27 I: SetOperationStatus(pulling-image) succeeded
2018/01/10 04:34:27 I: Pulling image "gcr.io/gcp-variant-transforms/gcp-variant-transforms"
2018/01/10 04:35:20 I: Pulled image "gcr.io/gcp-variant-transforms/gcp-variant-transforms" successfully.
2018/01/10 04:35:20 I: Done copying files.
2018/01/10 04:35:20 I: Switching to status: running-docker
2018/01/10 04:35:20 I: Calling SetOperationStatus(running-docker)
2018/01/10 04:35:20 I: SetOperationStatus(running-docker) succeeded
2018/01/10 04:35:20 I: Setting these data volumes on the docker container: [-v /tmp/ggp-305484772:/tmp/ggp-305484772]
2018/01/10 04:35:20 I: Running command: docker run -v /tmp/ggp-305484772:/tmp/ggp-305484772 gcr.io/gcp-variant-transforms/gcp-variant-transforms /tmp/ggp-305484772
2018/01/10 05:11:04 E: command failed: No handlers could be found for logger "oauth2client.contrib.multistore_file"
/opt/gcp_variant_transforms/venv/local/lib/python2.7/site-packages/apache_beam/io/gcp/gcsio.py:122: DeprecationWarning: object() takes no parameters
  super(GcsIO, cls).__new__(cls, storage_client))
INFO:root:Starting the size estimation of the input
INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
INFO:root:Finished the size estimation of the input at 1 files. Estimation took 0.0635919570923 seconds
INFO:root:Starting the size estimation of the input
INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
INFO:root:Finished the size estimation of the input at 1 files. Estimation took 0.0466470718384 seconds
INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
INFO:root:Starting the size estimation of the input
INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
INFO:root:Finished the size estimation of the input at 1 files. Estimation took 0.0660800933838 seconds
INFO:root:Starting the size estimation of the input
INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
INFO:root:Finished the size estimation of the input at 1 files. Estimation took 0.0607531070709 seconds
/opt/gcp_variant_transforms/venv/local/lib/python2.7/site-packages/apache_beam/coders/typecoders.py:134: UserWarning: Using fallback coder for typehint: Any.
  warnings.warn('Using fallback coder for typehint: %r.' % typehint)
INFO:root:Executing command: ['/opt/gcp_variant_transforms/venv/bin/python', 'setup.py', 'sdist', '--dist-dir', '/tmp/tmpFiWctv']
warning: check: missing required meta-data: url

warning: check: missing meta-data: if 'author' supplied, 'author_email' must be supplied too

INFO:root:Starting GCS upload to gs://XXX/staging/gnomad-genomes-to-bigquery-pipeline-chr1.1515558922.910737/workflow.tar.gz...
INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
INFO:root:Completed GCS upload to gs://XXX/staging/gnomad-genomes-to-bigquery-pipeline-chr1.1515558922.910737/workflow.tar.gz
INFO:root:Staging the SDK tarball from PyPI to gs://XXX/staging/gnomad-genomes-to-bigquery-pipeline-chr1.1515558922.910737/dataflow_python_sdk.tar
INFO:root:Executing command: ['/opt/gcp_variant_transforms/venv/bin/python', '-m', 'pip', 'install', '--download', '/tmp/tmpFiWctv', 'apache-beam==2.2.0', '--no-binary', ':all:', '--no-deps']
DEPRECATION: pip install --download has been deprecated and will be removed in the future. Pip now has a download command that should be used instead.
INFO:root:file copy from /tmp/tmpFiWctv/apache-beam-2.2.0.zip to gs://XXX/staging/gnomad-genomes-to-bigquery-pipeline-chr1.1515558922.910737/dataflow_python_sdk.tar.
INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
INFO:root:Create job: <Job
 createTime: u'2018-01-10T04:35:37.539665Z'
 currentStateTime: u'1970-01-01T00:00:00Z'
 id: u'2018-01-09_20_35_36-6246558984112825034'
 location: u'us-central1'
 name: u'gnomad-genomes-to-bigquery-pipeline-chr1'
 projectId: u'XXX'
 stageStates: []
 steps: []
 tempFiles: []
 type: TypeValueValuesEnum(JOB_TYPE_BATCH, 1)>
INFO:root:Created job with id: [2018-01-09_20_35_36-6246558984112825034]
INFO:root:To access the Dataflow monitoring console, please navigate to https://console.cloud.google.com/dataflow/jobsDetail/locations/us-central1/jobs/2018-01-09_20_35_36-6246558984112825034?project=XXX
INFO:root:Job 2018-01-09_20_35_36-6246558984112825034 is in state JOB_STATE_PENDING
INFO:root:2018-01-10T04:35:36.979Z: JOB_MESSAGE_DETAILED: (56b03c4ce491e46b): Autoscaling is enabled for job 2018-01-09_20_35_36-6246558984112825034. The number of workers will be between 1 and 15.
INFO:root:2018-01-10T04:35:37.009Z: JOB_MESSAGE_DETAILED: (56b03c4ce491ee58): Autoscaling was automatically enabled for job 2018-01-09_20_35_36-6246558984112825034.
INFO:root:2018-01-10T04:35:39.256Z: JOB_MESSAGE_DETAILED: (52771809aae76d0e): Checking required Cloud APIs are enabled.
INFO:root:2018-01-10T04:35:40.098Z: JOB_MESSAGE_DETAILED: (52771809aae766b0): Expanding CoGroupByKey operations into optimizable parts.
INFO:root:2018-01-10T04:35:40.122Z: JOB_MESSAGE_DETAILED: (52771809aae769e5): Expanding GroupByKey operations into optimizable parts.
INFO:root:2018-01-10T04:35:40.148Z: JOB_MESSAGE_DETAILED: (52771809aae766b3): Lifting ValueCombiningMappingFns into MergeBucketsMappingFns
INFO:root:2018-01-10T04:35:40.171Z: JOB_MESSAGE_DEBUG: (52771809aae76381): Annotating graph with Autotuner information.
INFO:root:2018-01-10T04:35:40.199Z: JOB_MESSAGE_DETAILED: (52771809aae76d1d): Fusing adjacent ParDo, Read, Write, and Flatten operations
INFO:root:2018-01-10T04:35:40.226Z: JOB_MESSAGE_DETAILED: (52771809aae769eb): Fusing consumer FilterVariants/ApplyFilters into ReadFromVcf/Read
INFO:root:2018-01-10T04:35:40.248Z: JOB_MESSAGE_DETAILED: (52771809aae766b9): Fusing consumer VariantToBigQuery/ConvertToBigQueryTableRow into FilterVariants/ApplyFilters
INFO:root:2018-01-10T04:35:40.281Z: JOB_MESSAGE_DETAILED: (52771809aae76387): Fusing consumer VariantToBigQuery/WriteToBigQuery/NativeWrite into VariantToBigQuery/ConvertToBigQueryTableRow
INFO:root:2018-01-10T04:35:40.304Z: JOB_MESSAGE_DEBUG: (52771809aae76055): Workflow config is missing a default resource spec.
INFO:root:2018-01-10T04:35:40.331Z: JOB_MESSAGE_DEBUG: (52771809aae76d23): Adding StepResource setup and teardown to workflow graph.
INFO:root:2018-01-10T04:35:40.363Z: JOB_MESSAGE_DEBUG: (52771809aae769f1): Adding workflow start and stop steps.
INFO:root:2018-01-10T04:35:40.390Z: JOB_MESSAGE_DEBUG: (52771809aae766bf): Assigning stage ids.
INFO:root:2018-01-10T04:35:40.511Z: JOB_MESSAGE_DEBUG: (280200dbfaf6cac0): Executing wait step start3
INFO:root:2018-01-10T04:35:40.565Z: JOB_MESSAGE_BASIC: (12e360a044595f8): Executing operation ReadFromVcf/Read+FilterVariants/ApplyFilters+VariantToBigQuery/ConvertToBigQueryTableRow+VariantToBigQuery/WriteToBigQuery/NativeWrite
INFO:root:2018-01-10T04:35:40.599Z: JOB_MESSAGE_DEBUG: (35a4b300ff329644): Starting worker pool setup.
INFO:root:2018-01-10T04:35:40.628Z: JOB_MESSAGE_BASIC: (35a4b300ff3297da): Starting 10 workers in us-central1-f...
INFO:root:Job 2018-01-09_20_35_36-6246558984112825034 is in state JOB_STATE_RUNNING
INFO:root:2018-01-10T04:35:47.357Z: JOB_MESSAGE_DETAILED: (20f2c6bf2c54fdd0): Autoscaling: Raised the number of workers to 0 based on the rate of progress in the currently running step(s).
INFO:root:2018-01-10T04:35:57.734Z: JOB_MESSAGE_DETAILED: (20f2c6bf2c54ffab): Autoscaling: Raised the number of workers to 3 based on the rate of progress in the currently running step(s).
INFO:root:2018-01-10T04:35:57.764Z: JOB_MESSAGE_DETAILED: (20f2c6bf2c54fe01): Resized worker pool to 3, though goal was 10.  This could be a quota issue.
INFO:root:2018-01-10T04:36:02.987Z: JOB_MESSAGE_DETAILED: (20f2c6bf2c54f759): Autoscaling: Raised the number of workers to 9 based on the rate of progress in the currently running step(s).
INFO:root:2018-01-10T04:36:03.018Z: JOB_MESSAGE_DETAILED: (20f2c6bf2c54f5af): Resized worker pool to 9, though goal was 10.  This could be a quota issue.
INFO:root:2018-01-10T04:36:14.570Z: JOB_MESSAGE_DETAILED: (2c7b5f62246857c2): Workers have started successfully.
INFO:root:2018-01-10T04:36:18.590Z: JOB_MESSAGE_DETAILED: (20f2c6bf2c54f436): Autoscaling: Raised the number of workers to 10 based on the rate of progress in the currently running step(s).
INFO:root:2018-01-10T04:39:14.929Z: JOB_MESSAGE_BASIC: (280200dbfaf6cb22): Autoscaling: Resizing worker pool from 10 to 15.
INFO:root:2018-01-10T04:39:20.293Z: JOB_MESSAGE_DETAILED: (20f2c6bf2c54fed4): Autoscaling: Raised the number of workers to 14 based on the rate of progress in the currently running step(s).
INFO:root:2018-01-10T04:39:20.320Z: JOB_MESSAGE_DETAILED: (20f2c6bf2c54fd2a): Resized worker pool to 14, though goal was 15.  This could be a quota issue.
INFO:root:2018-01-10T04:39:25.575Z: JOB_MESSAGE_DETAILED: (20f2c6bf2c54f5ad): Autoscaling: Raised the number of workers to 15 based on the rate of progress in the currently running step(s).
INFO:root:2018-01-10T04:57:04.250Z: JOB_MESSAGE_ERROR: (8f917a73bb67a9d9): Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 582, in do_work
    work_executor.execute()
  File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/executor.py", line 167, in execute
    op.start()
  File "dataflow_worker/native_operations.py", line 38, in dataflow_worker.native_operations.NativeReadOperation.start
    def start(self):
  File "dataflow_worker/native_operations.py", line 39, in dataflow_worker.native_operations.NativeReadOperation.start
    with self.scoped_start_state:
  File "dataflow_worker/native_operations.py", line 44, in dataflow_worker.native_operations.NativeReadOperation.start
    with self.spec.source.reader() as reader:
  File "dataflow_worker/native_operations.py", line 54, in dataflow_worker.native_operations.NativeReadOperation.start
    self.output(windowed_value)
  File "apache_beam/runners/worker/operations.py", line 154, in apache_beam.runners.worker.operations.Operation.output
    cython.cast(Receiver, self.receivers[output_index]).receive(windowed_value)
  File "apache_beam/runners/worker/operations.py", line 86, in apache_beam.runners.worker.operations.ConsumerSet.receive
    cython.cast(Operation, consumer).process(windowed_value)
  File "apache_beam/runners/worker/operations.py", line 339, in apache_beam.runners.worker.operations.DoOperation.process
    with self.scoped_process_state:
  File "apache_beam/runners/worker/operations.py", line 340, in apache_beam.runners.worker.operations.DoOperation.process
    self.dofn_receiver.receive(o)
  File "apache_beam/runners/common.py", line 382, in apache_beam.runners.common.DoFnRunner.receive
    self.process(windowed_value)
  File "apache_beam/runners/common.py", line 390, in apache_beam.runners.common.DoFnRunner.process
    self._reraise_augmented(exn)
  File "apache_beam/runners/common.py", line 415, in apache_beam.runners.common.DoFnRunner._reraise_augmented
    raise
  File "apache_beam/runners/common.py", line 388, in apache_beam.runners.common.DoFnRunner.process
    self.do_fn_invoker.invoke_process(windowed_value)
  File "apache_beam/runners/common.py", line 189, in apache_beam.runners.common.SimpleInvoker.invoke_process
    self.output_processor.process_outputs(
  File "apache_beam/runners/common.py", line 480, in apache_beam.runners.common._OutputProcessor.process_outputs
    self.main_receivers.receive(windowed_value)
  File "apache_beam/runners/worker/operations.py", line 86, in apache_beam.runners.worker.operations.ConsumerSet.receive
    cython.cast(Operation, consumer).process(windowed_value)
  File "apache_beam/runners/worker/operations.py", line 339, in apache_beam.runners.worker.operations.DoOperation.process
    with self.scoped_process_state:
  File "apache_beam/runners/worker/operations.py", line 340, in apache_beam.runners.worker.operations.DoOperation.process
    self.dofn_receiver.receive(o)
  File "apache_beam/runners/common.py", line 382, in apache_beam.runners.common.DoFnRunner.receive
    self.process(windowed_value)
  File "apache_beam/runners/common.py", line 390, in apache_beam.runners.common.DoFnRunner.process
    self._reraise_augmented(exn)
  File "apache_beam/runners/common.py", line 431, in apache_beam.runners.common.DoFnRunner._reraise_augmented
    raise new_exn, None, original_traceback
  File "apache_beam/runners/common.py", line 388, in apache_beam.runners.common.DoFnRunner.process
    self.do_fn_invoker.invoke_process(windowed_value)
  File "apache_beam/runners/common.py", line 189, in apache_beam.runners.common.SimpleInvoker.invoke_process
    self.output_processor.process_outputs(
  File "apache_beam/runners/common.py", line 480, in apache_beam.runners.common._OutputProcessor.process_outputs
    self.main_receivers.receive(windowed_value)
  File "apache_beam/runners/worker/operations.py", line 86, in apache_beam.runners.worker.operations.ConsumerSet.receive
    cython.cast(Operation, consumer).process(windowed_value)
  File "dataflow_worker/native_operations.py", line 98, in dataflow_worker.native_operations.NativeWriteOperation.process
    with self.scoped_process_state:
  File "dataflow_worker/native_operations.py", line 104, in dataflow_worker.native_operations.NativeWriteOperation.process
    self.writer.Write(o.value)
  File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/nativefileio.py", line 577, in Write
    super(TextFileWriter, self).Write(value)
  File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/nativefileio.py", line 462, in Write
    self.file.write(self.sink.coder.encode(value))
  File "/usr/local/lib/python2.7/dist-packages/apache_beam/io/gcp/bigquery.py", line 162, in encode
    raise ValueError('%s. %s' % (e, JSON_COMPLIANCE_ERROR))
ValueError: Out of range float values are not JSON compliant. NAN, INF and -INF values are not JSON compliant. [while running 'VariantToBigQuery/ConvertToBigQueryTableRow']

INFO:root:2018-01-10T04:59:20.080Z: JOB_MESSAGE_ERROR: (b0f417b262714c09): Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 582, in do_work
    work_executor.execute()
  File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/executor.py", line 167, in execute
    op.start()
  File "dataflow_worker/native_operations.py", line 38, in dataflow_worker.native_operations.NativeReadOperation.start
    def start(self):
  File "dataflow_worker/native_operations.py", line 39, in dataflow_worker.native_operations.NativeReadOperation.start
    with self.scoped_start_state:
  File "dataflow_worker/native_operations.py", line 44, in dataflow_worker.native_operations.NativeReadOperation.start
    with self.spec.source.reader() as reader:
  File "dataflow_worker/native_operations.py", line 54, in dataflow_worker.native_operations.NativeReadOperation.start
    self.output(windowed_value)
  File "apache_beam/runners/worker/operations.py", line 154, in apache_beam.runners.worker.operations.Operation.output
    cython.cast(Receiver, self.receivers[output_index]).receive(windowed_value)
  File "apache_beam/runners/worker/operations.py", line 86, in apache_beam.runners.worker.operations.ConsumerSet.receive
    cython.cast(Operation, consumer).process(windowed_value)
  File "apache_beam/runners/worker/operations.py", line 339, in apache_beam.runners.worker.operations.DoOperation.process
    with self.scoped_process_state:
  File "apache_beam/runners/worker/operations.py", line 340, in apache_beam.runners.worker.operations.DoOperation.process
    self.dofn_receiver.receive(o)
  File "apache_beam/runners/common.py", line 382, in apache_beam.runners.common.DoFnRunner.receive
    self.process(windowed_value)
  File "apache_beam/runners/common.py", line 390, in apache_beam.runners.common.DoFnRunner.process
    self._reraise_augmented(exn)
  File "apache_beam/runners/common.py", line 415, in apache_beam.runners.common.DoFnRunner._reraise_augmented
    raise
  File "apache_beam/runners/common.py", line 388, in apache_beam.runners.common.DoFnRunner.process
    self.do_fn_invoker.invoke_process(windowed_value)
  File "apache_beam/runners/common.py", line 189, in apache_beam.runners.common.SimpleInvoker.invoke_process
    self.output_processor.process_outputs(
  File "apache_beam/runners/common.py", line 480, in apache_beam.runners.common._OutputProcessor.process_outputs
    self.main_receivers.receive(windowed_value)
  File "apache_beam/runners/worker/operations.py", line 86, in apache_beam.runners.worker.operations.ConsumerSet.receive
    cython.cast(Operation, consumer).process(windowed_value)
  File "apache_beam/runners/worker/operations.py", line 339, in apache_beam.runners.worker.operations.DoOperation.process
    with self.scoped_process_state:
  File "apache_beam/runners/worker/operations.py", line 340, in apache_beam.runners.worker.operations.DoOperation.process
    self.dofn_receiver.receive(o)
  File "apache_beam/runners/common.py", line 382, in apache_beam.runners.common.DoFnRunner.receive
    self.process(windowed_value)
  File "apache_beam/runners/common.py", line 390, in apache_beam.runners.common.DoFnRunner.process
    self._reraise_augmented(exn)
  File "apache_beam/runners/common.py", line 431, in apache_beam.runners.common.DoFnRunner._reraise_augmented
    raise new_exn, None, original_traceback
  File "apache_beam/runners/common.py", line 388, in apache_beam.runners.common.DoFnRunner.process
    self.do_fn_invoker.invoke_process(windowed_value)
  File "apache_beam/runners/common.py", line 189, in apache_beam.runners.common.SimpleInvoker.invoke_process
    self.output_processor.process_outputs(
  File "apache_beam/runners/common.py", line 480, in apache_beam.runners.common._OutputProcessor.process_outputs
    self.main_receivers.receive(windowed_value)
  File "apache_beam/runners/worker/operations.py", line 86, in apache_beam.runners.worker.operations.ConsumerSet.receive
    cython.cast(Operation, consumer).process(windowed_value)
  File "dataflow_worker/native_operations.py", line 98, in dataflow_worker.native_operations.NativeWriteOperation.process
    with self.scoped_process_state:
  File "dataflow_worker/native_operations.py", line 104, in dataflow_worker.native_operations.NativeWriteOperation.process
    self.writer.Write(o.value)
  File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/nativefileio.py", line 577, in Write
    super(TextFileWriter, self).Write(value)
  File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/nativefileio.py", line 462, in Write
    self.file.write(self.sink.coder.encode(value))
  File "/usr/local/lib/python2.7/dist-packages/apache_beam/io/gcp/bigquery.py", line 162, in encode
    raise ValueError('%s. %s' % (e, JSON_COMPLIANCE_ERROR))
ValueError: Out of range float values are not JSON compliant. NAN, INF and -INF values are not JSON compliant. [while running 'VariantToBigQuery/ConvertToBigQueryTableRow']

INFO:root:2018-01-10T05:01:35.890Z: JOB_MESSAGE_ERROR: (a4b435c70fc68065): Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 582, in do_work
    work_executor.execute()
  File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/executor.py", line 167, in execute
    op.start()
  File "dataflow_worker/native_operations.py", line 38, in dataflow_worker.native_operations.NativeReadOperation.start
    def start(self):
  File "dataflow_worker/native_operations.py", line 39, in dataflow_worker.native_operations.NativeReadOperation.start
    with self.scoped_start_state:
  File "dataflow_worker/native_operations.py", line 44, in dataflow_worker.native_operations.NativeReadOperation.start
    with self.spec.source.reader() as reader:
  File "dataflow_worker/native_operations.py", line 54, in dataflow_worker.native_operations.NativeReadOperation.start
    self.output(windowed_value)
  File "apache_beam/runners/worker/operations.py", line 154, in apache_beam.runners.worker.operations.Operation.output
    cython.cast(Receiver, self.receivers[output_index]).receive(windowed_value)
  File "apache_beam/runners/worker/operations.py", line 86, in apache_beam.runners.worker.operations.ConsumerSet.receive
    cython.cast(Operation, consumer).process(windowed_value)
  File "apache_beam/runners/worker/operations.py", line 339, in apache_beam.runners.worker.operations.DoOperation.process
    with self.scoped_process_state:
  File "apache_beam/runners/worker/operations.py", line 340, in apache_beam.runners.worker.operations.DoOperation.process
    self.dofn_receiver.receive(o)
  File "apache_beam/runners/common.py", line 382, in apache_beam.runners.common.DoFnRunner.receive
    self.process(windowed_value)
  File "apache_beam/runners/common.py", line 390, in apache_beam.runners.common.DoFnRunner.process
    self._reraise_augmented(exn)
  File "apache_beam/runners/common.py", line 415, in apache_beam.runners.common.DoFnRunner._reraise_augmented
    raise
  File "apache_beam/runners/common.py", line 388, in apache_beam.runners.common.DoFnRunner.process
    self.do_fn_invoker.invoke_process(windowed_value)
  File "apache_beam/runners/common.py", line 189, in apache_beam.runners.common.SimpleInvoker.invoke_process
    self.output_processor.process_outputs(
  File "apache_beam/runners/common.py", line 480, in apache_beam.runners.common._OutputProcessor.process_outputs
    self.main_receivers.receive(windowed_value)
  File "apache_beam/runners/worker/operations.py", line 86, in apache_beam.runners.worker.operations.ConsumerSet.receive
    cython.cast(Operation, consumer).process(windowed_value)
  File "apache_beam/runners/worker/operations.py", line 339, in apache_beam.runners.worker.operations.DoOperation.process
    with self.scoped_process_state:
  File "apache_beam/runners/worker/operations.py", line 340, in apache_beam.runners.worker.operations.DoOperation.process
    self.dofn_receiver.receive(o)
  File "apache_beam/runners/common.py", line 382, in apache_beam.runners.common.DoFnRunner.receive
    self.process(windowed_value)
  File "apache_beam/runners/common.py", line 390, in apache_beam.runners.common.DoFnRunner.process
    self._reraise_augmented(exn)
  File "apache_beam/runners/common.py", line 431, in apache_beam.runners.common.DoFnRunner._reraise_augmented
    raise new_exn, None, original_traceback
  File "apache_beam/runners/common.py", line 388, in apache_beam.runners.common.DoFnRunner.process
    self.do_fn_invoker.invoke_process(windowed_value)
  File "apache_beam/runners/common.py", line 189, in apache_beam.runners.common.SimpleInvoker.invoke_process
    self.output_processor.process_outputs(
  File "apache_beam/runners/common.py", line 480, in apache_beam.runners.common._OutputProcessor.process_outputs
    self.main_receivers.receive(windowed_value)
  File "apache_beam/runners/worker/operations.py", line 86, in apache_beam.runners.worker.operations.ConsumerSet.receive
    cython.cast(Operation, consumer).process(windowed_value)
  File "dataflow_worker/native_operations.py", line 98, in dataflow_worker.native_operations.NativeWriteOperation.process
    with self.scoped_process_state:
  File "dataflow_worker/native_operations.py", line 104, in dataflow_worker.native_operations.NativeWriteOperation.process
    self.writer.Write(o.value)
  File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/nativefileio.py", line 577, in Write
    super(TextFileWriter, self).Write(value)
  File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/nativefileio.py", line 462, in Write
    self.file.write(self.sink.coder.encode(value))
  File "/usr/local/lib/python2.7/dist-packages/apache_beam/io/gcp/bigquery.py", line 162, in encode
    raise ValueError('%s. %s' % (e, JSON_COMPLIANCE_ERROR))
ValueError: Out of range float values are not JSON compliant. NAN, INF and -INF values are not JSON compliant. [while running 'VariantToBigQuery/ConvertToBigQueryTableRow']

INFO:root:2018-01-10T05:08:18.598Z: JOB_MESSAGE_ERROR: (a4b435c70fc689e8): Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 582, in do_work
    work_executor.execute()
  File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/executor.py", line 167, in execute
    op.start()
  File "dataflow_worker/native_operations.py", line 38, in dataflow_worker.native_operations.NativeReadOperation.start
    def start(self):
  File "dataflow_worker/native_operations.py", line 39, in dataflow_worker.native_operations.NativeReadOperation.start
    with self.scoped_start_state:
  File "dataflow_worker/native_operations.py", line 44, in dataflow_worker.native_operations.NativeReadOperation.start
    with self.spec.source.reader() as reader:
  File "dataflow_worker/native_operations.py", line 54, in dataflow_worker.native_operations.NativeReadOperation.start
    self.output(windowed_value)
  File "apache_beam/runners/worker/operations.py", line 154, in apache_beam.runners.worker.operations.Operation.output
    cython.cast(Receiver, self.receivers[output_index]).receive(windowed_value)
  File "apache_beam/runners/worker/operations.py", line 86, in apache_beam.runners.worker.operations.ConsumerSet.receive
    cython.cast(Operation, consumer).process(windowed_value)
  File "apache_beam/runners/worker/operations.py", line 339, in apache_beam.runners.worker.operations.DoOperation.process
    with self.scoped_process_state:
  File "apache_beam/runners/worker/operations.py", line 340, in apache_beam.runners.worker.operations.DoOperation.process
    self.dofn_receiver.receive(o)
  File "apache_beam/runners/common.py", line 382, in apache_beam.runners.common.DoFnRunner.receive
    self.process(windowed_value)
  File "apache_beam/runners/common.py", line 390, in apache_beam.runners.common.DoFnRunner.process
    self._reraise_augmented(exn)
  File "apache_beam/runners/common.py", line 415, in apache_beam.runners.common.DoFnRunner._reraise_augmented
    raise
  File "apache_beam/runners/common.py", line 388, in apache_beam.runners.common.DoFnRunner.process
    self.do_fn_invoker.invoke_process(windowed_value)
  File "apache_beam/runners/common.py", line 189, in apache_beam.runners.common.SimpleInvoker.invoke_process
    self.output_processor.process_outputs(
  File "apache_beam/runners/common.py", line 480, in apache_beam.runners.common._OutputProcessor.process_outputs
    self.main_receivers.receive(windowed_value)
  File "apache_beam/runners/worker/operations.py", line 86, in apache_beam.runners.worker.operations.ConsumerSet.receive
    cython.cast(Operation, consumer).process(windowed_value)
  File "apache_beam/runners/worker/operations.py", line 339, in apache_beam.runners.worker.operations.DoOperation.process
    with self.scoped_process_state:
  File "apache_beam/runners/worker/operations.py", line 340, in apache_beam.runners.worker.operations.DoOperation.process
    self.dofn_receiver.receive(o)
  File "apache_beam/runners/common.py", line 382, in apache_beam.runners.common.DoFnRunner.receive
    self.process(windowed_value)
  File "apache_beam/runners/common.py", line 390, in apache_beam.runners.common.DoFnRunner.process
    self._reraise_augmented(exn)
  File "apache_beam/runners/common.py", line 431, in apache_beam.runners.common.DoFnRunner._reraise_augmented
    raise new_exn, None, original_traceback
  File "apache_beam/runners/common.py", line 388, in apache_beam.runners.common.DoFnRunner.process
    self.do_fn_invoker.invoke_process(windowed_value)
  File "apache_beam/runners/common.py", line 189, in apache_beam.runners.common.SimpleInvoker.invoke_process
    self.output_processor.process_outputs(
  File "apache_beam/runners/common.py", line 480, in apache_beam.runners.common._OutputProcessor.process_outputs
    self.main_receivers.receive(windowed_value)
  File "apache_beam/runners/worker/operations.py", line 86, in apache_beam.runners.worker.operations.ConsumerSet.receive
    cython.cast(Operation, consumer).process(windowed_value)
  File "dataflow_worker/native_operations.py", line 98, in dataflow_worker.native_operations.NativeWriteOperation.process
    with self.scoped_process_state:
  File "dataflow_worker/native_operations.py", line 104, in dataflow_worker.native_operations.NativeWriteOperation.process
    self.writer.Write(o.value)
  File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/nativefileio.py", line 577, in Write
    super(TextFileWriter, self).Write(value)
  File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/nativefileio.py", line 462, in Write
    self.file.write(self.sink.coder.encode(value))
  File "/usr/local/lib/python2.7/dist-packages/apache_beam/io/gcp/bigquery.py", line 162, in encode
    raise ValueError('%s. %s' % (e, JSON_COMPLIANCE_ERROR))
ValueError: Out of range float values are not JSON compliant. NAN, INF and -INF values are not JSON compliant. [while running 'VariantToBigQuery/ConvertToBigQueryTableRow']

INFO:root:2018-01-10T05:09:41.397Z: JOB_MESSAGE_ERROR: (b0f417b262714adf): Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 582, in do_work
    work_executor.execute()
  File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/executor.py", line 167, in execute
    op.start()
  File "dataflow_worker/native_operations.py", line 38, in dataflow_worker.native_operations.NativeReadOperation.start
    def start(self):
  File "dataflow_worker/native_operations.py", line 39, in dataflow_worker.native_operations.NativeReadOperation.start
    with self.scoped_start_state:
  File "dataflow_worker/native_operations.py", line 44, in dataflow_worker.native_operations.NativeReadOperation.start
    with self.spec.source.reader() as reader:
  File "dataflow_worker/native_operations.py", line 54, in dataflow_worker.native_operations.NativeReadOperation.start
    self.output(windowed_value)
  File "apache_beam/runners/worker/operations.py", line 154, in apache_beam.runners.worker.operations.Operation.output
    cython.cast(Receiver, self.receivers[output_index]).receive(windowed_value)
  File "apache_beam/runners/worker/operations.py", line 86, in apache_beam.runners.worker.operations.ConsumerSet.receive
    cython.cast(Operation, consumer).process(windowed_value)
  File "apache_beam/runners/worker/operations.py", line 339, in apache_beam.runners.worker.operations.DoOperation.process
    with self.scoped_process_state:
  File "apache_beam/runners/worker/operations.py", line 340, in apache_beam.runners.worker.operations.DoOperation.process
    self.dofn_receiver.receive(o)
  File "apache_beam/runners/common.py", line 382, in apache_beam.runners.common.DoFnRunner.receive
    self.process(windowed_value)
  File "apache_beam/runners/common.py", line 390, in apache_beam.runners.common.DoFnRunner.process
    self._reraise_augmented(exn)
  File "apache_beam/runners/common.py", line 415, in apache_beam.runners.common.DoFnRunner._reraise_augmented
    raise
  File "apache_beam/runners/common.py", line 388, in apache_beam.runners.common.DoFnRunner.process
    self.do_fn_invoker.invoke_process(windowed_value)
  File "apache_beam/runners/common.py", line 189, in apache_beam.runners.common.SimpleInvoker.invoke_process
    self.output_processor.process_outputs(
  File "apache_beam/runners/common.py", line 480, in apache_beam.runners.common._OutputProcessor.process_outputs
    self.main_receivers.receive(windowed_value)
  File "apache_beam/runners/worker/operations.py", line 86, in apache_beam.runners.worker.operations.ConsumerSet.receive
    cython.cast(Operation, consumer).process(windowed_value)
  File "apache_beam/runners/worker/operations.py", line 339, in apache_beam.runners.worker.operations.DoOperation.process
    with self.scoped_process_state:
  File "apache_beam/runners/worker/operations.py", line 340, in apache_beam.runners.worker.operations.DoOperation.process
    self.dofn_receiver.receive(o)
  File "apache_beam/runners/common.py", line 382, in apache_beam.runners.common.DoFnRunner.receive
    self.process(windowed_value)
  File "apache_beam/runners/common.py", line 390, in apache_beam.runners.common.DoFnRunner.process
    self._reraise_augmented(exn)
  File "apache_beam/runners/common.py", line 431, in apache_beam.runners.common.DoFnRunner._reraise_augmented
    raise new_exn, None, original_traceback
  File "apache_beam/runners/common.py", line 388, in apache_beam.runners.common.DoFnRunner.process
    self.do_fn_invoker.invoke_process(windowed_value)
  File "apache_beam/runners/common.py", line 189, in apache_beam.runners.common.SimpleInvoker.invoke_process
    self.output_processor.process_outputs(
  File "apache_beam/runners/common.py", line 480, in apache_beam.runners.common._OutputProcessor.process_outputs
    self.main_receivers.receive(windowed_value)
  File "apache_beam/runners/worker/operations.py", line 86, in apache_beam.runners.worker.operations.ConsumerSet.receive
    cython.cast(Operation, consumer).process(windowed_value)
  File "dataflow_worker/native_operations.py", line 98, in dataflow_worker.native_operations.NativeWriteOperation.process
    with self.scoped_process_state:
  File "dataflow_worker/native_operations.py", line 104, in dataflow_worker.native_operations.NativeWriteOperation.process
    self.writer.Write(o.value)
  File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/nativefileio.py", line 577, in Write
    super(TextFileWriter, self).Write(value)
  File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/nativefileio.py", line 462, in Write
    self.file.write(self.sink.coder.encode(value))
  File "/usr/local/lib/python2.7/dist-packages/apache_beam/io/gcp/bigquery.py", line 162, in encode
    raise ValueError('%s. %s' % (e, JSON_COMPLIANCE_ERROR))
ValueError: Out of range float values are not JSON compliant. NAN, INF and -INF values are not JSON compliant. [while running 'VariantToBigQuery/ConvertToBigQueryTableRow']

INFO:root:2018-01-10T05:09:41.410Z: JOB_MESSAGE_BASIC: (42fdb1ab4fd8cff2): Executing BigQuery import job "dataflow_job_8333640118018302870". You can check its status with the bq tool: "bq show -j --project_id=XXX dataflow_job_8333640118018302870".
INFO:root:2018-01-10T05:09:41.435Z: JOB_MESSAGE_WARNING: (42fdb1ab4fd8cb68): Unable to delete temp files: "gs://XXX/temp/gnomad-genomes-to-bigquery-pipeline-chr1.1515558922.910737/8333640118018303863/dax-tmp-2018-01-09_20_35_36-6246558984112825034-S01-0-207f78e951a6f2a/@DAX.json."
INFO:root:2018-01-10T05:09:41.509Z: JOB_MESSAGE_DEBUG: (12e360a04459cb7): Executing failure step failure2
INFO:root:2018-01-10T05:09:41.533Z: JOB_MESSAGE_ERROR: (12e360a04459df1): Workflow failed. Causes: (12e360a04459a43): S01:ReadFromVcf/Read+FilterVariants/ApplyFilters+VariantToBigQuery/ConvertToBigQueryTableRow+VariantToBigQuery/WriteToBigQuery/NativeWrite failed., (1f26fcd604aa07c5): A work item was attempted 4 times without success. Each time the worker eventually lost contact with the service. The work item was attempted on: 
  gnomad-genomes-to-bigquer-01092035-ea7e-harness-dw84,
  gnomad-genomes-to-bigquer-01092035-ea7e-harness-dw84,
  gnomad-genomes-to-bigquer-01092035-ea7e-harness-dw84,
  gnomad-genomes-to-bigquer-01092035-ea7e-harness-q4nv
INFO:root:2018-01-10T05:09:41.670Z: JOB_MESSAGE_DETAILED: (52771809aae7605e): Cleaning up.
INFO:root:2018-01-10T05:09:41.774Z: JOB_MESSAGE_DEBUG: (52771809aae769fa): Starting worker pool teardown.
INFO:root:2018-01-10T05:09:41.797Z: JOB_MESSAGE_BASIC: (52771809aae766c8): Stopping worker pool...
INFO:root:2018-01-10T05:10:55.180Z: JOB_MESSAGE_DETAILED: (20f2c6bf2c54f475): Autoscaling: Resized worker pool from 15 to 0.
INFO:root:2018-01-10T05:10:55.208Z: JOB_MESSAGE_DETAILED: (20f2c6bf2c54f2cb): Autoscaling: Would further reduce the number of workers but reached the minimum number allowed for the job.
INFO:root:2018-01-10T05:10:55.263Z: JOB_MESSAGE_DEBUG: (52771809aae766ce): Tearing down pending resources...
INFO:root:Job 2018-01-09_20_35_36-6246558984112825034 is in state JOB_STATE_FAILED
Traceback (most recent call last):
  File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/opt/gcp_variant_transforms/src/gcp_variant_transforms/vcf_to_bq.py", line 223, in <module>
    run()
  File "/opt/gcp_variant_transforms/src/gcp_variant_transforms/vcf_to_bq.py", line 218, in run
    append=known_args.append))
  File "/opt/gcp_variant_transforms/venv/local/lib/python2.7/site-packages/apache_beam/pipeline.py", line 346, in __exit__
    self.run().wait_until_finish()
  File "/opt/gcp_variant_transforms/venv/local/lib/python2.7/site-packages/apache_beam/runners/dataflow/dataflow_runner.py", line 966, in wait_until_finish
    (self.state, getattr(self._runner, 'last_error_msg', None)), self)
apache_beam.runners.dataflow.dataflow_runner.DataflowRuntimeException: Dataflow pipeline failed. State: FAILED, Error:
(b0f417b262714adf): Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 582, in do_work
    work_executor.execute()
  File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/executor.py", line 167, in execute
    op.start()
  File "dataflow_worker/native_operations.py", line 38, in dataflow_worker.native_operations.NativeReadOperation.start
    def start(self):
  File "dataflow_worker/native_operations.py", line 39, in dataflow_worker.native_operations.NativeReadOperation.start
    with self.scoped_start_state:
  File "dataflow_worker/native_operations.py", line 44, in dataflow_worker.native_operations.NativeReadOperation.start
    with self.spec.source.reader() as reader:
  File "dataflow_worker/native_operations.py", line 54, in dataflow_worker.native_operations.NativeReadOperation.start
    self.output(windowed_value)
  File "apache_beam/runners/worker/operations.py", line 154, in apache_beam.runners.worker.operations.Operation.output
    cython.cast(Receiver, self.receivers[output_index]).receive(windowed_value)
  File "apache_beam/runners/worker/operations.py", line 86, in apache_beam.runners.worker.operations.ConsumerSet.receive
    cython.cast(Operation, consumer).process(windowed_value)
  File "apache_beam/runners/worker/operations.py", line 339, in apache_beam.runners.worker.operations.DoOperation.process
    with self.scoped_process_state:
  File "apache_beam/runners/worker/operations.py", line 340, in apache_beam.runners.worker.operations.DoOperation.process
    self.dofn_receiver.receive(o)
  File "apache_beam/runners/common.py", line 382, in apache_beam.runners.common.DoFnRunner.receive
    self.process(windowed_value)
  File "apache_beam/runners/common.py", line 390, in apache_beam.runners.common.DoFnRunner.process
    self._reraise_augmented(exn)
  File "apache_beam/runners/common.py", line 415, in apache_beam.runners.common.DoFnRunner._reraise_augmented
    raise
  File "apache_beam/runners/common.py", line 388, in apache_beam.runners.common.DoFnRunner.process
    self.do_fn_invoker.invoke_process(windowed_value)
  File "apache_beam/runners/common.py", line 189, in apache_beam.runners.common.SimpleInvoker.invoke_process
    self.output_processor.process_outputs(
  File "apache_beam/runners/common.py", line 480, in apache_beam.runners.common._OutputProcessor.process_outputs
    self.main_receivers.receive(windowed_value)
  File "apache_beam/runners/worker/operations.py", line 86, in apache_beam.runners.worker.operations.ConsumerSet.receive
    cython.cast(Operation, consumer).process(windowed_value)
  File "apache_beam/runners/worker/operations.py", line 339, in apache_beam.runners.worker.operations.DoOperation.process
    with self.scoped_process_state:
  File "apache_beam/runners/worker/operations.py", line 340, in apache_beam.runners.worker.operations.DoOperation.process
    self.dofn_receiver.receive(o)
  File "apache_beam/runners/common.py", line 382, in apache_beam.runners.common.DoFnRunner.receive
    self.process(windowed_value)
  File "apache_beam/runners/common.py", line 390, in apache_beam.runners.common.DoFnRunner.process
    self._reraise_augmented(exn)
  File "apache_beam/runners/common.py", line 431, in apache_beam.runners.common.DoFnRunner._reraise_augmented
    raise new_exn, None, original_traceback
  File "apache_beam/runners/common.py", line 388, in apache_beam.runners.common.DoFnRunner.process
    self.do_fn_invoker.invoke_process(windowed_value)
  File "apache_beam/runners/common.py", line 189, in apache_beam.runners.common.SimpleInvoker.invoke_process
    self.output_processor.process_outputs(
  File "apache_beam/runners/common.py", line 480, in apache_beam.runners.common._OutputProcessor.process_outputs
    self.main_receivers.receive(windowed_value)
  File "apache_beam/runners/worker/operations.py", line 86, in apache_beam.runners.worker.operations.ConsumerSet.receive
    cython.cast(Operation, consumer).process(windowed_value)
  File "dataflow_worker/native_operations.py", line 98, in dataflow_worker.native_operations.NativeWriteOperation.process
    with self.scoped_process_state:
  File "dataflow_worker/native_operations.py", line 104, in dataflow_worker.native_operations.NativeWriteOperation.process
    self.writer.Write(o.value)
  File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/nativefileio.py", line 577, in Write
    super(TextFileWriter, self).Write(value)
  File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/nativefileio.py", line 462, in Write
    self.file.write(self.sink.coder.encode(value))
  File "/usr/local/lib/python2.7/dist-packages/apache_beam/io/gcp/bigquery.py", line 162, in encode
    raise ValueError('%s. %s' % (e, JSON_COMPLIANCE_ERROR))
ValueError: Out of range float values are not JSON compliant. NAN, INF and -INF values are not JSON compliant. [while running 'VariantToBigQuery/ConvertToBigQueryTableRow']

 (exit status 1)

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:9 (5 by maintainers)

github_iconTop GitHub Comments

2reactions
arostamianfarcommented, Jan 12, 2018

Sorry about the misunderstanding! I didn’t realize you were still blocked on this. I just pushed the fix and tested with the chr1 file. The docker image is also updated. Please let me know if you encounter any other issues.

0reactions
hannes-brtcommented, Jan 12, 2018

A+! Thanks for the quick fix!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Extra Data error when importing json file using python
I'm trying to build a python script that imports json files into a MongoDB. This part of my script keeps jumping to the...
Read more >
Source code for gnomad.utils.vep
Source code for gnomad.utils.vep. # noqa: D100 import json import logging import os import subprocess from typing import List, Optional, Union import hail ......
Read more >
gnomAD Help
What genome build is the gnomAD data based on? · What version of GENCODE was used to annotate variants? · Are all the...
Read more >
Source code for hail.methods.qc
import hail as hl from collections import Counter import os from ... 'file:/vep_data/vep-azure.json' else: raise ValueError("No config set ...
Read more >
open-cravat
OpenCRAVAT is a python package that performs genomic variant ... json files and then imported to another job's result with buttons on the....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found