question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

bigtable-hbase-beam causes Dataflow job worker to fail

See original GitHub issue

problem statement with the following maven pom.xml dependency, Cloud Dataflow job worker fails to start.

<dependency>
            <groupId>com.google.cloud.bigtable</groupId>
            <artifactId>bigtable-hbase-beam</artifactId>
            <version>1.5.0</version>
        </dependency>

The workaround is to delete the above section in pom.xml and related Java code. Below is the error in Dataflow’s StackDriver logging:

Error syncing pod 414461ba254099a817d81f7d60657139 (“dataflow-debugme-worker-fails-04181156-lbcm-harness-g5rz_default(414461ba254099a817d81f7d60657139)”), skipping: failed to “StartContainer” for “java-streaming” with CrashLoopBackOff: “Back-off 2m40s restarting failed container=java-streaming pod=dataflow-debugme-worker-fails-04181156-lbcm-harness-g5rz_default(414461ba254099a817d81f7d60657139)”

The source is bigtable-beam-dataflow-worker-fail.zip.

To reproduce the bug, modify com/finack/models/Constants.java with the correct pub/sub subscriptions, project ID, BigTable instance:

  • PROJECT_ID
  • PUBSUB_SUBSCRIPTION_TO_PIPELINE
  • BIGTABLE_CLOUD_INSTANCE_ID
  • CFAMILY # column family for the table
  • TABLE_ID

Create a BigTable and pub/sub subscription per the fields above. Execute the following to run locally:

mvn exec:java -Dexec.mainClass=com.finack.app.FinackPipeline -Dexec.args=" --project=mwpfin --runner=DirectRunner --mode=debug " # --mode=debug is optional, without it, console output will be less verbose

If local runner executes without error, publish to the topic linked to the subscription and verify messages enter BigTable:

  1. Download the publish.py module.
  2. edit the first 2 lines of publish.py to put the correct project ID and topic.
  3. Every run will create a pub sub message with current timestamp.

Ideally, messages will write to BigTable. Then kill the local runner and execute the following command to run with DataflowRunner. Put a cloud storage path for --stagingLocation= mvn exec:java -Dexec.mainClass=com.finack.app.FinackPipeline -Dexec.args="--project=mwpfin --stagingLocation=gs://tmp/staging/ --runner=DataflowRunner --mode=dataflow "

Observe the error from StackDriver logging: image image The job appears to be running but does not write to BigTable. I proved BigTable dependency was the cause because of the following: Code without that dependency and write to System.out.println method instead of BigTable would not have the Error syncing pod errors.

I have not been able to access the compute engine node or the failed pod to debug further. I can’t find documentation that shows how. Maybe that’s not possible. Are you able to reproduce?

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:7 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
sduskiscommented, Apr 18, 2019

Try adding this to your bigtable-beam-hbase dependency:

            <exclusions>
                <exclusion>
                    <groupId>org.slf4j</groupId>
                    <artifactId>slf4j-log4j12</artifactId>
                </exclusion>
            </exclusions>

That worked for us. That ought to work for you as well even for the 1.5.0 release.

0reactions
hilliaocommented, Apr 22, 2019

I found the fix to enable StackDriver logging: replace

<dependency>
            <groupId>org.slf4j</groupId>
            <artifactId>slf4j-jdk14</artifactId>
            <version>${slf4j.version}</version>
            <!-- When loaded at runtime this will wire up slf4j to the JUL backend -->
            <scope>runtime</scope>
        </dependency>

with

<dependency>
            <groupId>org.slf4j</groupId>
            <artifactId>slf4j-simple</artifactId>
            <version>${slf4j.version}</version>
        </dependency>

On behalf of Maven wave partners (Google cloud north America service of 2018), we Thank @sduskis.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Troubleshoot Dataflow errors - Google Cloud
This error occurs if a single operation causes the worker code to fail four times. Dataflow fails the job, and this message is...
Read more >
Google Bigtable export hangs, is stuck, then fails in Dataflow ...
The program seems to kick off a Dataflow properly, but no matter what settings I use, workers never seem to get allocated to...
Read more >
spotify/scio - Gitter
16:45:56.948 CEST Proposing dynamic split of work unit myproject ... google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker ...
Read more >
Troubleshoot Slow or Stuck Jobs in Google Cloud Dataflow
Are you experiencing slowness with your jobs or your jobs getting stuck in Cloud Dataflow ?Slow/Stuck Dataflow jobs can be caused by a ......
Read more >
Re: Error on Data Flow Job - The Dataflow job appe...
This situation can happen due three major reasons: a) Some tasks take more than an hour to process. b) Some tasks got stuck...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found