Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

bigtable-hbase-beam causes Dataflow job worker to fail

See original GitHub issue

problem statement with the following maven pom.xml dependency, Cloud Dataflow job worker fails to start.

<dependency>
            <groupId>com.google.cloud.bigtable</groupId>
            <artifactId>bigtable-hbase-beam</artifactId>
            <version>1.5.0</version>
        </dependency>

The workaround is to delete the above section in pom.xml and related Java code. Below is the error in Dataflow’s StackDriver logging:

Error syncing pod 414461ba254099a817d81f7d60657139 (“dataflow-debugme-worker-fails-04181156-lbcm-harness-g5rz_default(414461ba254099a817d81f7d60657139)”), skipping: failed to “StartContainer” for “java-streaming” with CrashLoopBackOff: “Back-off 2m40s restarting failed container=java-streaming pod=dataflow-debugme-worker-fails-04181156-lbcm-harness-g5rz_default(414461ba254099a817d81f7d60657139)”

The source is bigtable-beam-dataflow-worker-fail.zip.

To reproduce the bug, modify com/finack/models/Constants.java with the correct pub/sub subscriptions, project ID, BigTable instance:

PROJECT_ID
PUBSUB_SUBSCRIPTION_TO_PIPELINE
BIGTABLE_CLOUD_INSTANCE_ID
CFAMILY # column family for the table
TABLE_ID

Create a BigTable and pub/sub subscription per the fields above. Execute the following to run locally:

mvn exec:java -Dexec.mainClass=com.finack.app.FinackPipeline -Dexec.args=" --project=mwpfin --runner=DirectRunner --mode=debug " # --mode=debug is optional, without it, console output will be less verbose

If local runner executes without error, publish to the topic linked to the subscription and verify messages enter BigTable:

Download the publish.py module.
edit the first 2 lines of publish.py to put the correct project ID and topic.
Every run will create a pub sub message with current timestamp.

Ideally, messages will write to BigTable. Then kill the local runner and execute the following command to run with DataflowRunner. Put a cloud storage path for --stagingLocation= mvn exec:java -Dexec.mainClass=com.finack.app.FinackPipeline -Dexec.args="--project=mwpfin --stagingLocation=gs://tmp/staging/ --runner=DataflowRunner --mode=dataflow "

Observe the error from StackDriver logging: The job appears to be running but does not write to BigTable. I proved BigTable dependency was the cause because of the following: Code without that dependency and write to System.out.println method instead of BigTable would not have the Error syncing pod errors.

I have not been able to access the compute engine node or the failed pod to debug further. I can’t find documentation that shows how. Maybe that’s not possible. Are you able to reproduce?

Issue Analytics

State:
Created 4 years ago
Comments:7 (4 by maintainers)

Top GitHub Comments

1reaction

sduskiscommented, Apr 18, 2019

Try adding this to your bigtable-beam-hbase dependency:

            <exclusions>
                <exclusion>
                    <groupId>org.slf4j</groupId>
                    <artifactId>slf4j-log4j12</artifactId>
                </exclusion>
            </exclusions>

That worked for us. That ought to work for you as well even for the 1.5.0 release.

0reactions

hilliaocommented, Apr 22, 2019

I found the fix to enable StackDriver logging: replace

<dependency>
            <groupId>org.slf4j</groupId>
            <artifactId>slf4j-jdk14</artifactId>
            <version>${slf4j.version}</version>
            <!-- When loaded at runtime this will wire up slf4j to the JUL backend -->
            <scope>runtime</scope>
        </dependency>

with

<dependency>
            <groupId>org.slf4j</groupId>
            <artifactId>slf4j-simple</artifactId>
            <version>${slf4j.version}</version>
        </dependency>

On behalf of Maven wave partners (Google cloud north America service of 2018), we Thank @sduskis.

Top Results From Across the Web

Troubleshoot Dataflow errors - Google Cloud

This error occurs if a single operation causes the worker code to fail four times. Dataflow fails the job, and this message is...

Google Bigtable export hangs, is stuck, then fails in Dataflow ...

The program seems to kick off a Dataflow properly, but no matter what settings I use, workers never seem to get allocated to...

spotify/scio - Gitter

16:45:56.948 CEST Proposing dynamic split of work unit myproject ... google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker ...

Troubleshoot Slow or Stuck Jobs in Google Cloud Dataflow

Are you experiencing slowness with your jobs or your jobs getting stuck in Cloud Dataflow ?Slow/Stuck Dataflow jobs can be caused by a ......

Re: Error on Data Flow Job - The Dataflow job appe...

This situation can happen due three major reasons: a) Some tasks take more than an hour to process. b) Some tasks got stuck...