Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Dataflow job (BiqQuery to Elasticsearch) timeouts and aborts the execution

See original GitHub issue

Related Template(s)

BigQuery to Elasticsearch

What happened?

I’ve created an index pipeline by leveraging the Dataflow template (flex) for BigQuery -> Elasticsearch. The goal is to index the patent dataset which is made available by Google Cloud in BigQuery. I was able to ingest some hundred thousand records after the Dataflow job breaks reporting timeout on the Elasticsearch side. I do not see any errors reported by Elasticsearch through the Elastic Cloud console. Increasing maxRetryAttempts=9, maxRetryDuration=299999 in the Dataflow configuration did not help.

Beam Version

Newer than 2.35.0

Relevant log output

{
  "textPayload": "Error message from worker: java.net.SocketTimeoutException: 30,000 milliseconds timeout on connection http-outgoing-1 [ACTIVE]\n\torg.elasticsearch.client.RestClient.extractAndWrapCause(RestClient.java:834)\n\torg.elasticsearch.client.RestClient.performRequest(RestClient.java:259)\n\torg.elasticsearch.client.RestClient.performRequest(RestClient.java:246)\n\tcom.google.cloud.teleport.v2.elasticsearch.utils.ElasticsearchIO$Write$WriteFn.flushBatch(ElasticsearchIO.java:1502)\n\tcom.google.cloud.teleport.v2.elasticsearch.utils.ElasticsearchIO$Write$WriteFn.processElement(ElasticsearchIO.java:1462)\nCaused by: java.net.SocketTimeoutException: 30,000 milliseconds timeout on connection http-outgoing-1 [ACTIVE]\n\torg.apache.http.nio.protocol.HttpAsyncRequestExecutor.timeout(HttpAsyncRequestExecutor.java:387)\n\torg.apache.http.impl.nio.client.InternalIODispatch.onTimeout(InternalIODispatch.java:92)\n\torg.apache.http.impl.nio.client.InternalIODispatch.onTimeout(InternalIODispatch.java:39)\n\torg.apache.http.impl.nio.reactor.AbstractIODispatch.timeout(AbstractIODispatch.java:175)\n\torg.apache.http.impl.nio.reactor.BaseIOReactor.sessionTimedOut(BaseIOReactor.java:261)\n\torg.apache.http.impl.nio.reactor.AbstractIOReactor.timeoutCheck(AbstractIOReactor.java:502)\n\torg.apache.http.impl.nio.reactor.BaseIOReactor.validate(BaseIOReactor.java:211)\n\torg.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:280)\n\torg.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:104)\n\torg.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:591)\n\tjava.lang.Thread.run(Thread.java:748)\n",
  "insertId": "o524g1d13xt",
  "resource": {
    "type": "dataflow_step",
    "labels": {
      "project_id": "1059491012611",
      "job_name": "dmarx-patent-pubs",
      "step_id": "",
      "region": "europe-west3",
      "job_id": "2022-08-21_12_40_52-600920110293958641"
    }
  },
  "timestamp": "2022-08-21T20:05:24.492885100Z",
  "severity": "ERROR",
  "labels": {
    "dataflow.googleapis.com/log_type": "system",
    "dataflow.googleapis.com/region": "europe-west3",
    "dataflow.googleapis.com/job_name": "dmarx-patent-pubs",
    "dataflow.googleapis.com/job_id": "2022-08-21_12_40_52-600920110293958641"
  },
  "logName": "projects/[PROJECTNAME_REMOVED]/logs/dataflow.googleapis.com%2Fjob-message",
  "receiveTimestamp": "2022-08-21T20:05:24.877736908Z"
}

and

{
  "textPayload": "Error message from worker: java.io.IOException: Failed to advance reader of source: name: \"projects/[PROJECTNAME_REMOVED]/locations/us/sessions/CAISDFNVczdaN1RRQzE5UhoCamQaAmpj/streams/CAEaAmpkGgJqYygC\"\n\n\torg.apache.beam.runners.dataflow.worker.WorkerCustomSources$BoundedReaderIterator.advance(WorkerCustomSources.java:625)\n\torg.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation$SynchronizedReaderIterator.advance(ReadOperation.java:425)\n\torg.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:211)\n\torg.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.start(ReadOperation.java:169)\n\torg.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:83)\n\torg.apache.beam.runners.dataflow.worker.BatchDataflowWorker.executeWork(BatchDataflowWorker.java:420)\n\torg.apache.beam.runners.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:389)\n\torg.apache.beam.runners.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:314)\n\torg.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.doWork(DataflowBatchWorkerHarness.java:140)\n\torg.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:120)\n\torg.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:107)\n\tjava.util.concurrent.FutureTask.run(FutureTask.java:266)\n\tjava.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)\n\tjava.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)\n\tjava.lang.Thread.run(Thread.java:748)\nCaused by: com.google.api.gax.rpc.FailedPreconditionException: io.grpc.StatusRuntimeException: FAILED_PRECONDITION: there was an error operating on 'projects/[PROJECTNAME_REMOVED]/locations/us/sessions/CAISDFNVczdaN1RRQzE5UhoCamQaAmpj/streams/CAEaAmpkGgJqYygC': session expired at 2022-08-22T01:44:03+00:00\n\tcom.google.api.gax.rpc.ApiExceptionFactory.createException(ApiExceptionFactory.java:59)\n\tcom.google.api.gax.grpc.GrpcApiExceptionFactory.create(GrpcApiExceptionFactory.java:72)\n\tcom.google.api.gax.grpc.GrpcApiExceptionFactory.create(GrpcApiExceptionFactory.java:60)\n\tcom.google.api.gax.grpc.ExceptionResponseObserver.onErrorImpl(ExceptionResponseObserver.java:82)\n\tcom.google.api.gax.rpc.StateCheckingResponseObserver.onError(StateCheckingResponseObserver.java:86)\n\tcom.google.api.gax.grpc.GrpcDirectStreamController$ResponseObserverAdapter.onClose(GrpcDirectStreamController.java:149)\n\tio.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39)\n\tio.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23)\n\tio.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40)\n\tio.grpc.census.CensusStatsModule$StatsClientInterceptor$1$1.onClose(CensusStatsModule.java:802)\n\tio.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39)\n\tio.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23)\n\tio.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40)\n\tio.grpc.census.CensusTracingModule$TracingClientInterceptor$1$1.onClose(CensusTracingModule.java:428)\n\tio.grpc.internal.DelayedClientCall$DelayedListener$3.run(DelayedClientCall.java:463)\n\tio.grpc.internal.DelayedClientCall$DelayedListener.delayOrExecute(DelayedClientCall.java:427)\n\tio.grpc.internal.DelayedClientCall$DelayedListener.onClose(DelayedClientCall.java:460)\n\tio.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:562)\n\tio.grpc.internal.ClientCallImpl.access$300(ClientCallImpl.java:70)\n\tio.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInternal(ClientCallImpl.java:743)\n\tio.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:722)\n\tio.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)\n\tio.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133)\n\t... 3 more\n\tSuppressed: java.lang.RuntimeException: Asynchronous task failed\n\t\tat com.google.api.gax.rpc.ServerStreamIterator.hasNext(ServerStreamIterator.java:105)\n\t\tat org.apache.beam.sdk.io.gcp.bigquery.BigQueryStorageStreamSource$BigQueryStorageStreamReader.readNextRecord(BigQueryStorageStreamSource.java:211)\n\t\tat org.apache.beam.sdk.io.gcp.bigquery.BigQueryStorageStreamSource$BigQueryStorageStreamReader.advance(BigQueryStorageStreamSource.java:206)\n\t\tat org.apache.beam.runners.dataflow.worker.WorkerCustomSources$BoundedReaderIterator.advance(WorkerCustomSources.java:622)\n\t\tat org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation$SynchronizedReaderIterator.advance(ReadOperation.java:425)\n\t\tat org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:211)\n\t\tat org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.start(ReadOperation.java:169)\n\t\tat org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:83)\n\t\tat org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.executeWork(BatchDataflowWorker.java:420)\n\t\tat org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:389)\n\t\tat org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:314)\n\t\tat org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.doWork(DataflowBatchWorkerHarness.java:140)\n\t\tat org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:120)\n\t\tat org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:107)\n\t\tat java.util.concurrent.FutureTask.run(FutureTask.java:266)\n\t\t... 3 more\nCaused by: io.grpc.StatusRuntimeException: FAILED_PRECONDITION: there was an error operating on 'projects/[PROJECTNAME_REMOVED]/locations/us/sessions/CAISDFNVczdaN1RRQzE5UhoCamQaAmpj/streams/CAEaAmpkGgJqYygC': session expired at 2022-08-22T01:44:03+00:00\n\tio.grpc.Status.asRuntimeException(Status.java:535)\n\t... 21 more\n",
  "insertId": "o524g1d13xw",
  "resource": {
    "type": "dataflow_step",
    "labels": {
      "step_id": "",
      "job_id": "2022-08-21_12_40_52-600920110293958641",
      "job_name": "dmarx-patent-pubs",
      "project_id": "1059491012611",
      "region": "europe-west3"
    }
  },
  "timestamp": "2022-08-22T01:44:31.742062353Z",
  "severity": "ERROR",
  "labels": {
    "dataflow.googleapis.com/log_type": "system",
    "dataflow.googleapis.com/job_name": "dmarx-patent-pubs",
    "dataflow.googleapis.com/job_id": "2022-08-21_12_40_52-600920110293958641",
    "dataflow.googleapis.com/region": "europe-west3"
  },
  "logName": "projects/[PROJECTNAME_REMOVED]/logs/dataflow.googleapis.com%2Fjob-message",
  "receiveTimestamp": "2022-08-22T01:44:32.337830356Z"
}

Issue Analytics

State:
Created a year ago
Reactions:1
Comments:7 (3 by maintainers)

Top GitHub Comments

1reaction

MarxDimitricommented, Aug 29, 2022

Thanks, @bvolpato! I did a mistake in my index configuration leading to large shards and, thus, long latencies in writing/reading data. After reconsidering the number and size (<50GB) of shards, things have started to work smoother. In combination with a rollover policy that can be configured over Index Lifecycle Management, it is even better. It would be OK to close the issue.

0reactions

melpomenecommented, Aug 29, 2022

It immediately scales up to the max number of workers and that happens hour(s) before I see this error.

Top Results From Across the Web

Pipeline troubleshooting and debugging | Cloud Dataflow

The Dataflow service validates any pipeline job you launch. Errors in the validation process can prevent your job from being successfully created or...

Issues · GoogleCloudPlatform/DataflowTemplates · GitHub

[Bug]: Datastream to BigQuery All the Data Types are converted to string bug Something ... Dataflow job (BiqQuery to Elasticsearch) timeouts and aborts...

Ingest data directly from Google BigQuery into Elastic using ...

Create a dataflow job from a template. Select the BigQuery to Elasticsearch template from the dropdown menu, which is one of Google's provided ......

Google BigQuery - StreamSets Documentation

Supported pipeline types: Data Collector The Google BigQuery origin executes a query job and reads the result from Google BigQuery.

Talend Components - Cloud - 8.0

Generates a data flow consolidating the status information of a job execution and transfer the data into defined output files. tChronometerStart, Operates as...