JSON ValueError when importing gnomAD variants
See original GitHub issueI am trying to import the gnomAD variants into BigQuery, but I consistently get an error that makes the pipeline fail on chr1. I was able to successfully import some other chromosomes (chr21 and chr10).
The error I’m getting is ValueError: Out of range float values are not JSON compliant. NAN, INF and -INF values are not JSON compliant. [while running 'VariantToBigQuery/ConvertToBigQueryTableRow']
.
I tried the --allow_malformed_records True
flag, but that made no difference.
The input file is from https://storage.googleapis.com/gnomad-public/release/2.0.2/vcf/genomes/gnomad.genomes.r2.0.2.sites.chr1.vcf.bgz and I decompressed it before running the pipeline.
My pipeline configuration is
name: gnomad-genomes-to-bigquery-pipeline
docker:
imageName: gcr.io/gcp-variant-transforms/gcp-variant-transforms
cmd: |
./opt/gcp_variant_transforms/bin/vcf_to_bq \
--project XXX \
--input_pattern gs://XXX/gnomad/vcf_decompressed/genomes/gnomad.genomes.r2.0.2.sites.chr1.vcf \
--allow_malformed_records True \
--output_table dg-platform:GnomAD.gnomad_genomes_chr1 \
--staging_location gs://XXX/staging \
--temp_location gs://XXX/temp \
--job_name gnomad-genomes-to-bigquery-pipeline-chr1 \
--runner DataflowRunner
The full log is
2018/01/10 04:34:27 I: Switching to status: pulling-image
2018/01/10 04:34:27 I: Calling SetOperationStatus(pulling-image)
2018/01/10 04:34:27 I: SetOperationStatus(pulling-image) succeeded
2018/01/10 04:34:27 I: Pulling image "gcr.io/gcp-variant-transforms/gcp-variant-transforms"
2018/01/10 04:35:20 I: Pulled image "gcr.io/gcp-variant-transforms/gcp-variant-transforms" successfully.
2018/01/10 04:35:20 I: Done copying files.
2018/01/10 04:35:20 I: Switching to status: running-docker
2018/01/10 04:35:20 I: Calling SetOperationStatus(running-docker)
2018/01/10 04:35:20 I: SetOperationStatus(running-docker) succeeded
2018/01/10 04:35:20 I: Setting these data volumes on the docker container: [-v /tmp/ggp-305484772:/tmp/ggp-305484772]
2018/01/10 04:35:20 I: Running command: docker run -v /tmp/ggp-305484772:/tmp/ggp-305484772 gcr.io/gcp-variant-transforms/gcp-variant-transforms /tmp/ggp-305484772
2018/01/10 05:11:04 E: command failed: No handlers could be found for logger "oauth2client.contrib.multistore_file"
/opt/gcp_variant_transforms/venv/local/lib/python2.7/site-packages/apache_beam/io/gcp/gcsio.py:122: DeprecationWarning: object() takes no parameters
super(GcsIO, cls).__new__(cls, storage_client))
INFO:root:Starting the size estimation of the input
INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
INFO:root:Finished the size estimation of the input at 1 files. Estimation took 0.0635919570923 seconds
INFO:root:Starting the size estimation of the input
INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
INFO:root:Finished the size estimation of the input at 1 files. Estimation took 0.0466470718384 seconds
INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
INFO:root:Starting the size estimation of the input
INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
INFO:root:Finished the size estimation of the input at 1 files. Estimation took 0.0660800933838 seconds
INFO:root:Starting the size estimation of the input
INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
INFO:root:Finished the size estimation of the input at 1 files. Estimation took 0.0607531070709 seconds
/opt/gcp_variant_transforms/venv/local/lib/python2.7/site-packages/apache_beam/coders/typecoders.py:134: UserWarning: Using fallback coder for typehint: Any.
warnings.warn('Using fallback coder for typehint: %r.' % typehint)
INFO:root:Executing command: ['/opt/gcp_variant_transforms/venv/bin/python', 'setup.py', 'sdist', '--dist-dir', '/tmp/tmpFiWctv']
warning: check: missing required meta-data: url
warning: check: missing meta-data: if 'author' supplied, 'author_email' must be supplied too
INFO:root:Starting GCS upload to gs://XXX/staging/gnomad-genomes-to-bigquery-pipeline-chr1.1515558922.910737/workflow.tar.gz...
INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
INFO:root:Completed GCS upload to gs://XXX/staging/gnomad-genomes-to-bigquery-pipeline-chr1.1515558922.910737/workflow.tar.gz
INFO:root:Staging the SDK tarball from PyPI to gs://XXX/staging/gnomad-genomes-to-bigquery-pipeline-chr1.1515558922.910737/dataflow_python_sdk.tar
INFO:root:Executing command: ['/opt/gcp_variant_transforms/venv/bin/python', '-m', 'pip', 'install', '--download', '/tmp/tmpFiWctv', 'apache-beam==2.2.0', '--no-binary', ':all:', '--no-deps']
DEPRECATION: pip install --download has been deprecated and will be removed in the future. Pip now has a download command that should be used instead.
INFO:root:file copy from /tmp/tmpFiWctv/apache-beam-2.2.0.zip to gs://XXX/staging/gnomad-genomes-to-bigquery-pipeline-chr1.1515558922.910737/dataflow_python_sdk.tar.
INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
INFO:root:Create job: <Job
createTime: u'2018-01-10T04:35:37.539665Z'
currentStateTime: u'1970-01-01T00:00:00Z'
id: u'2018-01-09_20_35_36-6246558984112825034'
location: u'us-central1'
name: u'gnomad-genomes-to-bigquery-pipeline-chr1'
projectId: u'XXX'
stageStates: []
steps: []
tempFiles: []
type: TypeValueValuesEnum(JOB_TYPE_BATCH, 1)>
INFO:root:Created job with id: [2018-01-09_20_35_36-6246558984112825034]
INFO:root:To access the Dataflow monitoring console, please navigate to https://console.cloud.google.com/dataflow/jobsDetail/locations/us-central1/jobs/2018-01-09_20_35_36-6246558984112825034?project=XXX
INFO:root:Job 2018-01-09_20_35_36-6246558984112825034 is in state JOB_STATE_PENDING
INFO:root:2018-01-10T04:35:36.979Z: JOB_MESSAGE_DETAILED: (56b03c4ce491e46b): Autoscaling is enabled for job 2018-01-09_20_35_36-6246558984112825034. The number of workers will be between 1 and 15.
INFO:root:2018-01-10T04:35:37.009Z: JOB_MESSAGE_DETAILED: (56b03c4ce491ee58): Autoscaling was automatically enabled for job 2018-01-09_20_35_36-6246558984112825034.
INFO:root:2018-01-10T04:35:39.256Z: JOB_MESSAGE_DETAILED: (52771809aae76d0e): Checking required Cloud APIs are enabled.
INFO:root:2018-01-10T04:35:40.098Z: JOB_MESSAGE_DETAILED: (52771809aae766b0): Expanding CoGroupByKey operations into optimizable parts.
INFO:root:2018-01-10T04:35:40.122Z: JOB_MESSAGE_DETAILED: (52771809aae769e5): Expanding GroupByKey operations into optimizable parts.
INFO:root:2018-01-10T04:35:40.148Z: JOB_MESSAGE_DETAILED: (52771809aae766b3): Lifting ValueCombiningMappingFns into MergeBucketsMappingFns
INFO:root:2018-01-10T04:35:40.171Z: JOB_MESSAGE_DEBUG: (52771809aae76381): Annotating graph with Autotuner information.
INFO:root:2018-01-10T04:35:40.199Z: JOB_MESSAGE_DETAILED: (52771809aae76d1d): Fusing adjacent ParDo, Read, Write, and Flatten operations
INFO:root:2018-01-10T04:35:40.226Z: JOB_MESSAGE_DETAILED: (52771809aae769eb): Fusing consumer FilterVariants/ApplyFilters into ReadFromVcf/Read
INFO:root:2018-01-10T04:35:40.248Z: JOB_MESSAGE_DETAILED: (52771809aae766b9): Fusing consumer VariantToBigQuery/ConvertToBigQueryTableRow into FilterVariants/ApplyFilters
INFO:root:2018-01-10T04:35:40.281Z: JOB_MESSAGE_DETAILED: (52771809aae76387): Fusing consumer VariantToBigQuery/WriteToBigQuery/NativeWrite into VariantToBigQuery/ConvertToBigQueryTableRow
INFO:root:2018-01-10T04:35:40.304Z: JOB_MESSAGE_DEBUG: (52771809aae76055): Workflow config is missing a default resource spec.
INFO:root:2018-01-10T04:35:40.331Z: JOB_MESSAGE_DEBUG: (52771809aae76d23): Adding StepResource setup and teardown to workflow graph.
INFO:root:2018-01-10T04:35:40.363Z: JOB_MESSAGE_DEBUG: (52771809aae769f1): Adding workflow start and stop steps.
INFO:root:2018-01-10T04:35:40.390Z: JOB_MESSAGE_DEBUG: (52771809aae766bf): Assigning stage ids.
INFO:root:2018-01-10T04:35:40.511Z: JOB_MESSAGE_DEBUG: (280200dbfaf6cac0): Executing wait step start3
INFO:root:2018-01-10T04:35:40.565Z: JOB_MESSAGE_BASIC: (12e360a044595f8): Executing operation ReadFromVcf/Read+FilterVariants/ApplyFilters+VariantToBigQuery/ConvertToBigQueryTableRow+VariantToBigQuery/WriteToBigQuery/NativeWrite
INFO:root:2018-01-10T04:35:40.599Z: JOB_MESSAGE_DEBUG: (35a4b300ff329644): Starting worker pool setup.
INFO:root:2018-01-10T04:35:40.628Z: JOB_MESSAGE_BASIC: (35a4b300ff3297da): Starting 10 workers in us-central1-f...
INFO:root:Job 2018-01-09_20_35_36-6246558984112825034 is in state JOB_STATE_RUNNING
INFO:root:2018-01-10T04:35:47.357Z: JOB_MESSAGE_DETAILED: (20f2c6bf2c54fdd0): Autoscaling: Raised the number of workers to 0 based on the rate of progress in the currently running step(s).
INFO:root:2018-01-10T04:35:57.734Z: JOB_MESSAGE_DETAILED: (20f2c6bf2c54ffab): Autoscaling: Raised the number of workers to 3 based on the rate of progress in the currently running step(s).
INFO:root:2018-01-10T04:35:57.764Z: JOB_MESSAGE_DETAILED: (20f2c6bf2c54fe01): Resized worker pool to 3, though goal was 10. This could be a quota issue.
INFO:root:2018-01-10T04:36:02.987Z: JOB_MESSAGE_DETAILED: (20f2c6bf2c54f759): Autoscaling: Raised the number of workers to 9 based on the rate of progress in the currently running step(s).
INFO:root:2018-01-10T04:36:03.018Z: JOB_MESSAGE_DETAILED: (20f2c6bf2c54f5af): Resized worker pool to 9, though goal was 10. This could be a quota issue.
INFO:root:2018-01-10T04:36:14.570Z: JOB_MESSAGE_DETAILED: (2c7b5f62246857c2): Workers have started successfully.
INFO:root:2018-01-10T04:36:18.590Z: JOB_MESSAGE_DETAILED: (20f2c6bf2c54f436): Autoscaling: Raised the number of workers to 10 based on the rate of progress in the currently running step(s).
INFO:root:2018-01-10T04:39:14.929Z: JOB_MESSAGE_BASIC: (280200dbfaf6cb22): Autoscaling: Resizing worker pool from 10 to 15.
INFO:root:2018-01-10T04:39:20.293Z: JOB_MESSAGE_DETAILED: (20f2c6bf2c54fed4): Autoscaling: Raised the number of workers to 14 based on the rate of progress in the currently running step(s).
INFO:root:2018-01-10T04:39:20.320Z: JOB_MESSAGE_DETAILED: (20f2c6bf2c54fd2a): Resized worker pool to 14, though goal was 15. This could be a quota issue.
INFO:root:2018-01-10T04:39:25.575Z: JOB_MESSAGE_DETAILED: (20f2c6bf2c54f5ad): Autoscaling: Raised the number of workers to 15 based on the rate of progress in the currently running step(s).
INFO:root:2018-01-10T04:57:04.250Z: JOB_MESSAGE_ERROR: (8f917a73bb67a9d9): Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 582, in do_work
work_executor.execute()
File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/executor.py", line 167, in execute
op.start()
File "dataflow_worker/native_operations.py", line 38, in dataflow_worker.native_operations.NativeReadOperation.start
def start(self):
File "dataflow_worker/native_operations.py", line 39, in dataflow_worker.native_operations.NativeReadOperation.start
with self.scoped_start_state:
File "dataflow_worker/native_operations.py", line 44, in dataflow_worker.native_operations.NativeReadOperation.start
with self.spec.source.reader() as reader:
File "dataflow_worker/native_operations.py", line 54, in dataflow_worker.native_operations.NativeReadOperation.start
self.output(windowed_value)
File "apache_beam/runners/worker/operations.py", line 154, in apache_beam.runners.worker.operations.Operation.output
cython.cast(Receiver, self.receivers[output_index]).receive(windowed_value)
File "apache_beam/runners/worker/operations.py", line 86, in apache_beam.runners.worker.operations.ConsumerSet.receive
cython.cast(Operation, consumer).process(windowed_value)
File "apache_beam/runners/worker/operations.py", line 339, in apache_beam.runners.worker.operations.DoOperation.process
with self.scoped_process_state:
File "apache_beam/runners/worker/operations.py", line 340, in apache_beam.runners.worker.operations.DoOperation.process
self.dofn_receiver.receive(o)
File "apache_beam/runners/common.py", line 382, in apache_beam.runners.common.DoFnRunner.receive
self.process(windowed_value)
File "apache_beam/runners/common.py", line 390, in apache_beam.runners.common.DoFnRunner.process
self._reraise_augmented(exn)
File "apache_beam/runners/common.py", line 415, in apache_beam.runners.common.DoFnRunner._reraise_augmented
raise
File "apache_beam/runners/common.py", line 388, in apache_beam.runners.common.DoFnRunner.process
self.do_fn_invoker.invoke_process(windowed_value)
File "apache_beam/runners/common.py", line 189, in apache_beam.runners.common.SimpleInvoker.invoke_process
self.output_processor.process_outputs(
File "apache_beam/runners/common.py", line 480, in apache_beam.runners.common._OutputProcessor.process_outputs
self.main_receivers.receive(windowed_value)
File "apache_beam/runners/worker/operations.py", line 86, in apache_beam.runners.worker.operations.ConsumerSet.receive
cython.cast(Operation, consumer).process(windowed_value)
File "apache_beam/runners/worker/operations.py", line 339, in apache_beam.runners.worker.operations.DoOperation.process
with self.scoped_process_state:
File "apache_beam/runners/worker/operations.py", line 340, in apache_beam.runners.worker.operations.DoOperation.process
self.dofn_receiver.receive(o)
File "apache_beam/runners/common.py", line 382, in apache_beam.runners.common.DoFnRunner.receive
self.process(windowed_value)
File "apache_beam/runners/common.py", line 390, in apache_beam.runners.common.DoFnRunner.process
self._reraise_augmented(exn)
File "apache_beam/runners/common.py", line 431, in apache_beam.runners.common.DoFnRunner._reraise_augmented
raise new_exn, None, original_traceback
File "apache_beam/runners/common.py", line 388, in apache_beam.runners.common.DoFnRunner.process
self.do_fn_invoker.invoke_process(windowed_value)
File "apache_beam/runners/common.py", line 189, in apache_beam.runners.common.SimpleInvoker.invoke_process
self.output_processor.process_outputs(
File "apache_beam/runners/common.py", line 480, in apache_beam.runners.common._OutputProcessor.process_outputs
self.main_receivers.receive(windowed_value)
File "apache_beam/runners/worker/operations.py", line 86, in apache_beam.runners.worker.operations.ConsumerSet.receive
cython.cast(Operation, consumer).process(windowed_value)
File "dataflow_worker/native_operations.py", line 98, in dataflow_worker.native_operations.NativeWriteOperation.process
with self.scoped_process_state:
File "dataflow_worker/native_operations.py", line 104, in dataflow_worker.native_operations.NativeWriteOperation.process
self.writer.Write(o.value)
File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/nativefileio.py", line 577, in Write
super(TextFileWriter, self).Write(value)
File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/nativefileio.py", line 462, in Write
self.file.write(self.sink.coder.encode(value))
File "/usr/local/lib/python2.7/dist-packages/apache_beam/io/gcp/bigquery.py", line 162, in encode
raise ValueError('%s. %s' % (e, JSON_COMPLIANCE_ERROR))
ValueError: Out of range float values are not JSON compliant. NAN, INF and -INF values are not JSON compliant. [while running 'VariantToBigQuery/ConvertToBigQueryTableRow']
INFO:root:2018-01-10T04:59:20.080Z: JOB_MESSAGE_ERROR: (b0f417b262714c09): Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 582, in do_work
work_executor.execute()
File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/executor.py", line 167, in execute
op.start()
File "dataflow_worker/native_operations.py", line 38, in dataflow_worker.native_operations.NativeReadOperation.start
def start(self):
File "dataflow_worker/native_operations.py", line 39, in dataflow_worker.native_operations.NativeReadOperation.start
with self.scoped_start_state:
File "dataflow_worker/native_operations.py", line 44, in dataflow_worker.native_operations.NativeReadOperation.start
with self.spec.source.reader() as reader:
File "dataflow_worker/native_operations.py", line 54, in dataflow_worker.native_operations.NativeReadOperation.start
self.output(windowed_value)
File "apache_beam/runners/worker/operations.py", line 154, in apache_beam.runners.worker.operations.Operation.output
cython.cast(Receiver, self.receivers[output_index]).receive(windowed_value)
File "apache_beam/runners/worker/operations.py", line 86, in apache_beam.runners.worker.operations.ConsumerSet.receive
cython.cast(Operation, consumer).process(windowed_value)
File "apache_beam/runners/worker/operations.py", line 339, in apache_beam.runners.worker.operations.DoOperation.process
with self.scoped_process_state:
File "apache_beam/runners/worker/operations.py", line 340, in apache_beam.runners.worker.operations.DoOperation.process
self.dofn_receiver.receive(o)
File "apache_beam/runners/common.py", line 382, in apache_beam.runners.common.DoFnRunner.receive
self.process(windowed_value)
File "apache_beam/runners/common.py", line 390, in apache_beam.runners.common.DoFnRunner.process
self._reraise_augmented(exn)
File "apache_beam/runners/common.py", line 415, in apache_beam.runners.common.DoFnRunner._reraise_augmented
raise
File "apache_beam/runners/common.py", line 388, in apache_beam.runners.common.DoFnRunner.process
self.do_fn_invoker.invoke_process(windowed_value)
File "apache_beam/runners/common.py", line 189, in apache_beam.runners.common.SimpleInvoker.invoke_process
self.output_processor.process_outputs(
File "apache_beam/runners/common.py", line 480, in apache_beam.runners.common._OutputProcessor.process_outputs
self.main_receivers.receive(windowed_value)
File "apache_beam/runners/worker/operations.py", line 86, in apache_beam.runners.worker.operations.ConsumerSet.receive
cython.cast(Operation, consumer).process(windowed_value)
File "apache_beam/runners/worker/operations.py", line 339, in apache_beam.runners.worker.operations.DoOperation.process
with self.scoped_process_state:
File "apache_beam/runners/worker/operations.py", line 340, in apache_beam.runners.worker.operations.DoOperation.process
self.dofn_receiver.receive(o)
File "apache_beam/runners/common.py", line 382, in apache_beam.runners.common.DoFnRunner.receive
self.process(windowed_value)
File "apache_beam/runners/common.py", line 390, in apache_beam.runners.common.DoFnRunner.process
self._reraise_augmented(exn)
File "apache_beam/runners/common.py", line 431, in apache_beam.runners.common.DoFnRunner._reraise_augmented
raise new_exn, None, original_traceback
File "apache_beam/runners/common.py", line 388, in apache_beam.runners.common.DoFnRunner.process
self.do_fn_invoker.invoke_process(windowed_value)
File "apache_beam/runners/common.py", line 189, in apache_beam.runners.common.SimpleInvoker.invoke_process
self.output_processor.process_outputs(
File "apache_beam/runners/common.py", line 480, in apache_beam.runners.common._OutputProcessor.process_outputs
self.main_receivers.receive(windowed_value)
File "apache_beam/runners/worker/operations.py", line 86, in apache_beam.runners.worker.operations.ConsumerSet.receive
cython.cast(Operation, consumer).process(windowed_value)
File "dataflow_worker/native_operations.py", line 98, in dataflow_worker.native_operations.NativeWriteOperation.process
with self.scoped_process_state:
File "dataflow_worker/native_operations.py", line 104, in dataflow_worker.native_operations.NativeWriteOperation.process
self.writer.Write(o.value)
File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/nativefileio.py", line 577, in Write
super(TextFileWriter, self).Write(value)
File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/nativefileio.py", line 462, in Write
self.file.write(self.sink.coder.encode(value))
File "/usr/local/lib/python2.7/dist-packages/apache_beam/io/gcp/bigquery.py", line 162, in encode
raise ValueError('%s. %s' % (e, JSON_COMPLIANCE_ERROR))
ValueError: Out of range float values are not JSON compliant. NAN, INF and -INF values are not JSON compliant. [while running 'VariantToBigQuery/ConvertToBigQueryTableRow']
INFO:root:2018-01-10T05:01:35.890Z: JOB_MESSAGE_ERROR: (a4b435c70fc68065): Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 582, in do_work
work_executor.execute()
File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/executor.py", line 167, in execute
op.start()
File "dataflow_worker/native_operations.py", line 38, in dataflow_worker.native_operations.NativeReadOperation.start
def start(self):
File "dataflow_worker/native_operations.py", line 39, in dataflow_worker.native_operations.NativeReadOperation.start
with self.scoped_start_state:
File "dataflow_worker/native_operations.py", line 44, in dataflow_worker.native_operations.NativeReadOperation.start
with self.spec.source.reader() as reader:
File "dataflow_worker/native_operations.py", line 54, in dataflow_worker.native_operations.NativeReadOperation.start
self.output(windowed_value)
File "apache_beam/runners/worker/operations.py", line 154, in apache_beam.runners.worker.operations.Operation.output
cython.cast(Receiver, self.receivers[output_index]).receive(windowed_value)
File "apache_beam/runners/worker/operations.py", line 86, in apache_beam.runners.worker.operations.ConsumerSet.receive
cython.cast(Operation, consumer).process(windowed_value)
File "apache_beam/runners/worker/operations.py", line 339, in apache_beam.runners.worker.operations.DoOperation.process
with self.scoped_process_state:
File "apache_beam/runners/worker/operations.py", line 340, in apache_beam.runners.worker.operations.DoOperation.process
self.dofn_receiver.receive(o)
File "apache_beam/runners/common.py", line 382, in apache_beam.runners.common.DoFnRunner.receive
self.process(windowed_value)
File "apache_beam/runners/common.py", line 390, in apache_beam.runners.common.DoFnRunner.process
self._reraise_augmented(exn)
File "apache_beam/runners/common.py", line 415, in apache_beam.runners.common.DoFnRunner._reraise_augmented
raise
File "apache_beam/runners/common.py", line 388, in apache_beam.runners.common.DoFnRunner.process
self.do_fn_invoker.invoke_process(windowed_value)
File "apache_beam/runners/common.py", line 189, in apache_beam.runners.common.SimpleInvoker.invoke_process
self.output_processor.process_outputs(
File "apache_beam/runners/common.py", line 480, in apache_beam.runners.common._OutputProcessor.process_outputs
self.main_receivers.receive(windowed_value)
File "apache_beam/runners/worker/operations.py", line 86, in apache_beam.runners.worker.operations.ConsumerSet.receive
cython.cast(Operation, consumer).process(windowed_value)
File "apache_beam/runners/worker/operations.py", line 339, in apache_beam.runners.worker.operations.DoOperation.process
with self.scoped_process_state:
File "apache_beam/runners/worker/operations.py", line 340, in apache_beam.runners.worker.operations.DoOperation.process
self.dofn_receiver.receive(o)
File "apache_beam/runners/common.py", line 382, in apache_beam.runners.common.DoFnRunner.receive
self.process(windowed_value)
File "apache_beam/runners/common.py", line 390, in apache_beam.runners.common.DoFnRunner.process
self._reraise_augmented(exn)
File "apache_beam/runners/common.py", line 431, in apache_beam.runners.common.DoFnRunner._reraise_augmented
raise new_exn, None, original_traceback
File "apache_beam/runners/common.py", line 388, in apache_beam.runners.common.DoFnRunner.process
self.do_fn_invoker.invoke_process(windowed_value)
File "apache_beam/runners/common.py", line 189, in apache_beam.runners.common.SimpleInvoker.invoke_process
self.output_processor.process_outputs(
File "apache_beam/runners/common.py", line 480, in apache_beam.runners.common._OutputProcessor.process_outputs
self.main_receivers.receive(windowed_value)
File "apache_beam/runners/worker/operations.py", line 86, in apache_beam.runners.worker.operations.ConsumerSet.receive
cython.cast(Operation, consumer).process(windowed_value)
File "dataflow_worker/native_operations.py", line 98, in dataflow_worker.native_operations.NativeWriteOperation.process
with self.scoped_process_state:
File "dataflow_worker/native_operations.py", line 104, in dataflow_worker.native_operations.NativeWriteOperation.process
self.writer.Write(o.value)
File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/nativefileio.py", line 577, in Write
super(TextFileWriter, self).Write(value)
File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/nativefileio.py", line 462, in Write
self.file.write(self.sink.coder.encode(value))
File "/usr/local/lib/python2.7/dist-packages/apache_beam/io/gcp/bigquery.py", line 162, in encode
raise ValueError('%s. %s' % (e, JSON_COMPLIANCE_ERROR))
ValueError: Out of range float values are not JSON compliant. NAN, INF and -INF values are not JSON compliant. [while running 'VariantToBigQuery/ConvertToBigQueryTableRow']
INFO:root:2018-01-10T05:08:18.598Z: JOB_MESSAGE_ERROR: (a4b435c70fc689e8): Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 582, in do_work
work_executor.execute()
File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/executor.py", line 167, in execute
op.start()
File "dataflow_worker/native_operations.py", line 38, in dataflow_worker.native_operations.NativeReadOperation.start
def start(self):
File "dataflow_worker/native_operations.py", line 39, in dataflow_worker.native_operations.NativeReadOperation.start
with self.scoped_start_state:
File "dataflow_worker/native_operations.py", line 44, in dataflow_worker.native_operations.NativeReadOperation.start
with self.spec.source.reader() as reader:
File "dataflow_worker/native_operations.py", line 54, in dataflow_worker.native_operations.NativeReadOperation.start
self.output(windowed_value)
File "apache_beam/runners/worker/operations.py", line 154, in apache_beam.runners.worker.operations.Operation.output
cython.cast(Receiver, self.receivers[output_index]).receive(windowed_value)
File "apache_beam/runners/worker/operations.py", line 86, in apache_beam.runners.worker.operations.ConsumerSet.receive
cython.cast(Operation, consumer).process(windowed_value)
File "apache_beam/runners/worker/operations.py", line 339, in apache_beam.runners.worker.operations.DoOperation.process
with self.scoped_process_state:
File "apache_beam/runners/worker/operations.py", line 340, in apache_beam.runners.worker.operations.DoOperation.process
self.dofn_receiver.receive(o)
File "apache_beam/runners/common.py", line 382, in apache_beam.runners.common.DoFnRunner.receive
self.process(windowed_value)
File "apache_beam/runners/common.py", line 390, in apache_beam.runners.common.DoFnRunner.process
self._reraise_augmented(exn)
File "apache_beam/runners/common.py", line 415, in apache_beam.runners.common.DoFnRunner._reraise_augmented
raise
File "apache_beam/runners/common.py", line 388, in apache_beam.runners.common.DoFnRunner.process
self.do_fn_invoker.invoke_process(windowed_value)
File "apache_beam/runners/common.py", line 189, in apache_beam.runners.common.SimpleInvoker.invoke_process
self.output_processor.process_outputs(
File "apache_beam/runners/common.py", line 480, in apache_beam.runners.common._OutputProcessor.process_outputs
self.main_receivers.receive(windowed_value)
File "apache_beam/runners/worker/operations.py", line 86, in apache_beam.runners.worker.operations.ConsumerSet.receive
cython.cast(Operation, consumer).process(windowed_value)
File "apache_beam/runners/worker/operations.py", line 339, in apache_beam.runners.worker.operations.DoOperation.process
with self.scoped_process_state:
File "apache_beam/runners/worker/operations.py", line 340, in apache_beam.runners.worker.operations.DoOperation.process
self.dofn_receiver.receive(o)
File "apache_beam/runners/common.py", line 382, in apache_beam.runners.common.DoFnRunner.receive
self.process(windowed_value)
File "apache_beam/runners/common.py", line 390, in apache_beam.runners.common.DoFnRunner.process
self._reraise_augmented(exn)
File "apache_beam/runners/common.py", line 431, in apache_beam.runners.common.DoFnRunner._reraise_augmented
raise new_exn, None, original_traceback
File "apache_beam/runners/common.py", line 388, in apache_beam.runners.common.DoFnRunner.process
self.do_fn_invoker.invoke_process(windowed_value)
File "apache_beam/runners/common.py", line 189, in apache_beam.runners.common.SimpleInvoker.invoke_process
self.output_processor.process_outputs(
File "apache_beam/runners/common.py", line 480, in apache_beam.runners.common._OutputProcessor.process_outputs
self.main_receivers.receive(windowed_value)
File "apache_beam/runners/worker/operations.py", line 86, in apache_beam.runners.worker.operations.ConsumerSet.receive
cython.cast(Operation, consumer).process(windowed_value)
File "dataflow_worker/native_operations.py", line 98, in dataflow_worker.native_operations.NativeWriteOperation.process
with self.scoped_process_state:
File "dataflow_worker/native_operations.py", line 104, in dataflow_worker.native_operations.NativeWriteOperation.process
self.writer.Write(o.value)
File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/nativefileio.py", line 577, in Write
super(TextFileWriter, self).Write(value)
File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/nativefileio.py", line 462, in Write
self.file.write(self.sink.coder.encode(value))
File "/usr/local/lib/python2.7/dist-packages/apache_beam/io/gcp/bigquery.py", line 162, in encode
raise ValueError('%s. %s' % (e, JSON_COMPLIANCE_ERROR))
ValueError: Out of range float values are not JSON compliant. NAN, INF and -INF values are not JSON compliant. [while running 'VariantToBigQuery/ConvertToBigQueryTableRow']
INFO:root:2018-01-10T05:09:41.397Z: JOB_MESSAGE_ERROR: (b0f417b262714adf): Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 582, in do_work
work_executor.execute()
File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/executor.py", line 167, in execute
op.start()
File "dataflow_worker/native_operations.py", line 38, in dataflow_worker.native_operations.NativeReadOperation.start
def start(self):
File "dataflow_worker/native_operations.py", line 39, in dataflow_worker.native_operations.NativeReadOperation.start
with self.scoped_start_state:
File "dataflow_worker/native_operations.py", line 44, in dataflow_worker.native_operations.NativeReadOperation.start
with self.spec.source.reader() as reader:
File "dataflow_worker/native_operations.py", line 54, in dataflow_worker.native_operations.NativeReadOperation.start
self.output(windowed_value)
File "apache_beam/runners/worker/operations.py", line 154, in apache_beam.runners.worker.operations.Operation.output
cython.cast(Receiver, self.receivers[output_index]).receive(windowed_value)
File "apache_beam/runners/worker/operations.py", line 86, in apache_beam.runners.worker.operations.ConsumerSet.receive
cython.cast(Operation, consumer).process(windowed_value)
File "apache_beam/runners/worker/operations.py", line 339, in apache_beam.runners.worker.operations.DoOperation.process
with self.scoped_process_state:
File "apache_beam/runners/worker/operations.py", line 340, in apache_beam.runners.worker.operations.DoOperation.process
self.dofn_receiver.receive(o)
File "apache_beam/runners/common.py", line 382, in apache_beam.runners.common.DoFnRunner.receive
self.process(windowed_value)
File "apache_beam/runners/common.py", line 390, in apache_beam.runners.common.DoFnRunner.process
self._reraise_augmented(exn)
File "apache_beam/runners/common.py", line 415, in apache_beam.runners.common.DoFnRunner._reraise_augmented
raise
File "apache_beam/runners/common.py", line 388, in apache_beam.runners.common.DoFnRunner.process
self.do_fn_invoker.invoke_process(windowed_value)
File "apache_beam/runners/common.py", line 189, in apache_beam.runners.common.SimpleInvoker.invoke_process
self.output_processor.process_outputs(
File "apache_beam/runners/common.py", line 480, in apache_beam.runners.common._OutputProcessor.process_outputs
self.main_receivers.receive(windowed_value)
File "apache_beam/runners/worker/operations.py", line 86, in apache_beam.runners.worker.operations.ConsumerSet.receive
cython.cast(Operation, consumer).process(windowed_value)
File "apache_beam/runners/worker/operations.py", line 339, in apache_beam.runners.worker.operations.DoOperation.process
with self.scoped_process_state:
File "apache_beam/runners/worker/operations.py", line 340, in apache_beam.runners.worker.operations.DoOperation.process
self.dofn_receiver.receive(o)
File "apache_beam/runners/common.py", line 382, in apache_beam.runners.common.DoFnRunner.receive
self.process(windowed_value)
File "apache_beam/runners/common.py", line 390, in apache_beam.runners.common.DoFnRunner.process
self._reraise_augmented(exn)
File "apache_beam/runners/common.py", line 431, in apache_beam.runners.common.DoFnRunner._reraise_augmented
raise new_exn, None, original_traceback
File "apache_beam/runners/common.py", line 388, in apache_beam.runners.common.DoFnRunner.process
self.do_fn_invoker.invoke_process(windowed_value)
File "apache_beam/runners/common.py", line 189, in apache_beam.runners.common.SimpleInvoker.invoke_process
self.output_processor.process_outputs(
File "apache_beam/runners/common.py", line 480, in apache_beam.runners.common._OutputProcessor.process_outputs
self.main_receivers.receive(windowed_value)
File "apache_beam/runners/worker/operations.py", line 86, in apache_beam.runners.worker.operations.ConsumerSet.receive
cython.cast(Operation, consumer).process(windowed_value)
File "dataflow_worker/native_operations.py", line 98, in dataflow_worker.native_operations.NativeWriteOperation.process
with self.scoped_process_state:
File "dataflow_worker/native_operations.py", line 104, in dataflow_worker.native_operations.NativeWriteOperation.process
self.writer.Write(o.value)
File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/nativefileio.py", line 577, in Write
super(TextFileWriter, self).Write(value)
File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/nativefileio.py", line 462, in Write
self.file.write(self.sink.coder.encode(value))
File "/usr/local/lib/python2.7/dist-packages/apache_beam/io/gcp/bigquery.py", line 162, in encode
raise ValueError('%s. %s' % (e, JSON_COMPLIANCE_ERROR))
ValueError: Out of range float values are not JSON compliant. NAN, INF and -INF values are not JSON compliant. [while running 'VariantToBigQuery/ConvertToBigQueryTableRow']
INFO:root:2018-01-10T05:09:41.410Z: JOB_MESSAGE_BASIC: (42fdb1ab4fd8cff2): Executing BigQuery import job "dataflow_job_8333640118018302870". You can check its status with the bq tool: "bq show -j --project_id=XXX dataflow_job_8333640118018302870".
INFO:root:2018-01-10T05:09:41.435Z: JOB_MESSAGE_WARNING: (42fdb1ab4fd8cb68): Unable to delete temp files: "gs://XXX/temp/gnomad-genomes-to-bigquery-pipeline-chr1.1515558922.910737/8333640118018303863/dax-tmp-2018-01-09_20_35_36-6246558984112825034-S01-0-207f78e951a6f2a/@DAX.json."
INFO:root:2018-01-10T05:09:41.509Z: JOB_MESSAGE_DEBUG: (12e360a04459cb7): Executing failure step failure2
INFO:root:2018-01-10T05:09:41.533Z: JOB_MESSAGE_ERROR: (12e360a04459df1): Workflow failed. Causes: (12e360a04459a43): S01:ReadFromVcf/Read+FilterVariants/ApplyFilters+VariantToBigQuery/ConvertToBigQueryTableRow+VariantToBigQuery/WriteToBigQuery/NativeWrite failed., (1f26fcd604aa07c5): A work item was attempted 4 times without success. Each time the worker eventually lost contact with the service. The work item was attempted on:
gnomad-genomes-to-bigquer-01092035-ea7e-harness-dw84,
gnomad-genomes-to-bigquer-01092035-ea7e-harness-dw84,
gnomad-genomes-to-bigquer-01092035-ea7e-harness-dw84,
gnomad-genomes-to-bigquer-01092035-ea7e-harness-q4nv
INFO:root:2018-01-10T05:09:41.670Z: JOB_MESSAGE_DETAILED: (52771809aae7605e): Cleaning up.
INFO:root:2018-01-10T05:09:41.774Z: JOB_MESSAGE_DEBUG: (52771809aae769fa): Starting worker pool teardown.
INFO:root:2018-01-10T05:09:41.797Z: JOB_MESSAGE_BASIC: (52771809aae766c8): Stopping worker pool...
INFO:root:2018-01-10T05:10:55.180Z: JOB_MESSAGE_DETAILED: (20f2c6bf2c54f475): Autoscaling: Resized worker pool from 15 to 0.
INFO:root:2018-01-10T05:10:55.208Z: JOB_MESSAGE_DETAILED: (20f2c6bf2c54f2cb): Autoscaling: Would further reduce the number of workers but reached the minimum number allowed for the job.
INFO:root:2018-01-10T05:10:55.263Z: JOB_MESSAGE_DEBUG: (52771809aae766ce): Tearing down pending resources...
INFO:root:Job 2018-01-09_20_35_36-6246558984112825034 is in state JOB_STATE_FAILED
Traceback (most recent call last):
File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main
"__main__", fname, loader, pkg_name)
File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "/opt/gcp_variant_transforms/src/gcp_variant_transforms/vcf_to_bq.py", line 223, in <module>
run()
File "/opt/gcp_variant_transforms/src/gcp_variant_transforms/vcf_to_bq.py", line 218, in run
append=known_args.append))
File "/opt/gcp_variant_transforms/venv/local/lib/python2.7/site-packages/apache_beam/pipeline.py", line 346, in __exit__
self.run().wait_until_finish()
File "/opt/gcp_variant_transforms/venv/local/lib/python2.7/site-packages/apache_beam/runners/dataflow/dataflow_runner.py", line 966, in wait_until_finish
(self.state, getattr(self._runner, 'last_error_msg', None)), self)
apache_beam.runners.dataflow.dataflow_runner.DataflowRuntimeException: Dataflow pipeline failed. State: FAILED, Error:
(b0f417b262714adf): Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 582, in do_work
work_executor.execute()
File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/executor.py", line 167, in execute
op.start()
File "dataflow_worker/native_operations.py", line 38, in dataflow_worker.native_operations.NativeReadOperation.start
def start(self):
File "dataflow_worker/native_operations.py", line 39, in dataflow_worker.native_operations.NativeReadOperation.start
with self.scoped_start_state:
File "dataflow_worker/native_operations.py", line 44, in dataflow_worker.native_operations.NativeReadOperation.start
with self.spec.source.reader() as reader:
File "dataflow_worker/native_operations.py", line 54, in dataflow_worker.native_operations.NativeReadOperation.start
self.output(windowed_value)
File "apache_beam/runners/worker/operations.py", line 154, in apache_beam.runners.worker.operations.Operation.output
cython.cast(Receiver, self.receivers[output_index]).receive(windowed_value)
File "apache_beam/runners/worker/operations.py", line 86, in apache_beam.runners.worker.operations.ConsumerSet.receive
cython.cast(Operation, consumer).process(windowed_value)
File "apache_beam/runners/worker/operations.py", line 339, in apache_beam.runners.worker.operations.DoOperation.process
with self.scoped_process_state:
File "apache_beam/runners/worker/operations.py", line 340, in apache_beam.runners.worker.operations.DoOperation.process
self.dofn_receiver.receive(o)
File "apache_beam/runners/common.py", line 382, in apache_beam.runners.common.DoFnRunner.receive
self.process(windowed_value)
File "apache_beam/runners/common.py", line 390, in apache_beam.runners.common.DoFnRunner.process
self._reraise_augmented(exn)
File "apache_beam/runners/common.py", line 415, in apache_beam.runners.common.DoFnRunner._reraise_augmented
raise
File "apache_beam/runners/common.py", line 388, in apache_beam.runners.common.DoFnRunner.process
self.do_fn_invoker.invoke_process(windowed_value)
File "apache_beam/runners/common.py", line 189, in apache_beam.runners.common.SimpleInvoker.invoke_process
self.output_processor.process_outputs(
File "apache_beam/runners/common.py", line 480, in apache_beam.runners.common._OutputProcessor.process_outputs
self.main_receivers.receive(windowed_value)
File "apache_beam/runners/worker/operations.py", line 86, in apache_beam.runners.worker.operations.ConsumerSet.receive
cython.cast(Operation, consumer).process(windowed_value)
File "apache_beam/runners/worker/operations.py", line 339, in apache_beam.runners.worker.operations.DoOperation.process
with self.scoped_process_state:
File "apache_beam/runners/worker/operations.py", line 340, in apache_beam.runners.worker.operations.DoOperation.process
self.dofn_receiver.receive(o)
File "apache_beam/runners/common.py", line 382, in apache_beam.runners.common.DoFnRunner.receive
self.process(windowed_value)
File "apache_beam/runners/common.py", line 390, in apache_beam.runners.common.DoFnRunner.process
self._reraise_augmented(exn)
File "apache_beam/runners/common.py", line 431, in apache_beam.runners.common.DoFnRunner._reraise_augmented
raise new_exn, None, original_traceback
File "apache_beam/runners/common.py", line 388, in apache_beam.runners.common.DoFnRunner.process
self.do_fn_invoker.invoke_process(windowed_value)
File "apache_beam/runners/common.py", line 189, in apache_beam.runners.common.SimpleInvoker.invoke_process
self.output_processor.process_outputs(
File "apache_beam/runners/common.py", line 480, in apache_beam.runners.common._OutputProcessor.process_outputs
self.main_receivers.receive(windowed_value)
File "apache_beam/runners/worker/operations.py", line 86, in apache_beam.runners.worker.operations.ConsumerSet.receive
cython.cast(Operation, consumer).process(windowed_value)
File "dataflow_worker/native_operations.py", line 98, in dataflow_worker.native_operations.NativeWriteOperation.process
with self.scoped_process_state:
File "dataflow_worker/native_operations.py", line 104, in dataflow_worker.native_operations.NativeWriteOperation.process
self.writer.Write(o.value)
File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/nativefileio.py", line 577, in Write
super(TextFileWriter, self).Write(value)
File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/nativefileio.py", line 462, in Write
self.file.write(self.sink.coder.encode(value))
File "/usr/local/lib/python2.7/dist-packages/apache_beam/io/gcp/bigquery.py", line 162, in encode
raise ValueError('%s. %s' % (e, JSON_COMPLIANCE_ERROR))
ValueError: Out of range float values are not JSON compliant. NAN, INF and -INF values are not JSON compliant. [while running 'VariantToBigQuery/ConvertToBigQueryTableRow']
(exit status 1)
Issue Analytics
- State:
- Created 6 years ago
- Comments:9 (5 by maintainers)
Top Results From Across the Web
Extra Data error when importing json file using python
I'm trying to build a python script that imports json files into a MongoDB. This part of my script keeps jumping to the...
Read more >Source code for gnomad.utils.vep
Source code for gnomad.utils.vep. # noqa: D100 import json import logging import os import subprocess from typing import List, Optional, Union import hail ......
Read more >gnomAD Help
What genome build is the gnomAD data based on? · What version of GENCODE was used to annotate variants? · Are all the...
Read more >Source code for hail.methods.qc
import hail as hl from collections import Counter import os from ... 'file:/vep_data/vep-azure.json' else: raise ValueError("No config set ...
Read more >open-cravat
OpenCRAVAT is a python package that performs genomic variant ... json files and then imported to another job's result with buttons on the....
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Sorry about the misunderstanding! I didn’t realize you were still blocked on this. I just pushed the fix and tested with the chr1 file. The docker image is also updated. Please let me know if you encounter any other issues.
A+! Thanks for the quick fix!