question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Dataflow Flex Template Operator

See original GitHub issue

Apache Airflow version:

  1. 1.10.9 Composer Airflow Image

Environment:

  • Cloud provider or hardware configuration: Cloud Composer

What happened: Error logs indicate appears to not recognize the job as Batch.

[2020-12-22 16:28:53,445] {taskinstance.py:1135} ERROR - ‘type’ Traceback (most recent call last) File “/usr/local/lib/airflow/airflow/models/taskinstance.py”, line 972, in _run_raw_tas result = task_copy.execute(context=context File “/usr/local/lib/airflow/airflow/providers/google/cloud/operators/dataflow.py”, line 647, in execut on_new_job_id_callback=set_current_job_id File “/usr/local/lib/airflow/airflow/providers/google/common/hooks/base_google.py”, line 383, in inner_wrappe return func(self, *args, **kwargs File “/usr/local/lib/airflow/airflow/providers/google/cloud/hooks/dataflow.py”, line 804, in start_flex_templat jobs_controller.wait_for_done( File “/usr/local/lib/airflow/airflow/providers/google/cloud/hooks/dataflow.py”, line 348, in wait_for_don while self._jobs and not all(self._check_dataflow_job_state(job) for job in self._jobs) File “/usr/local/lib/airflow/airflow/providers/google/cloud/hooks/dataflow.py”, line 348, in <genexpr while self._jobs and not all(self._check_dataflow_job_state(job) for job in self._jobs) File “/usr/local/lib/airflow/airflow/providers/google/cloud/hooks/dataflow.py”, line 321, in _check_dataflow_job_stat wait_for_running = job[‘type’] == DataflowJobType.JOB_TYPE_STREAMIN KeyError: 'type

I have specified:

with models.DAG(
    dag_id="pdc-test",
    start_date=days_ago(1),
    schedule_interval=None,
) as dag_flex_template:
    start_flex_template = DataflowStartFlexTemplateOperator(
        task_id="pdc-test",
        body={
            "launchParameter": {
                "containerSpecGcsPath": GCS_FLEX_TEMPLATE_TEMPLATE_PATH,
                "jobName": DATAFLOW_FLEX_TEMPLATE_JOB_NAME,
                "parameters": {
                    "stage": STAGE,
                    "target": TARGET,
                    "path": PATH,
                    "filename": FILENAME,
                    "column": "geometry"
                },
                "environment": {
                    "network": NETWORK,
                    "subnetwork": SUBNETWORK,
                    "machineType": "n1-standard-1",
                    "numWorkers": "1",
                    "maxWorkers": "1",
                    "tempLocation": "gs://test-pipelines-work/batch",
                    "workerZone": "northamerica-northeast1",
                    "enableStreamingEngine": "false",
                    "serviceAccountEmail": "<number>-compute@developer.gserviceaccount.com",
                    "ipConfiguration": "WORKER_IP_PRIVATE"
                },
            }
        },
        location=LOCATION,
        project_id=GCP_PROJECT_ID
    )```


**What you expected to happen**:

Expecting the dag to run.

<!-- What do you think went wrong? -->

Appears the Operator is not handling the input as a batch type Flex Template. DataflowJobType should be BATCH and not STREAMING.

**How to reproduce it**:
1. Create a Batch Flex Template as of https://cloud.google.com/dataflow/docs/guides/templates/using-flex-templates
2. Point code above to your registered template and invoke.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:8 (4 by maintainers)

github_iconTop GitHub Comments

2reactions
SpicySyntaxcommented, Feb 25, 2021

@TobKed @matthieucham Taking airflow/providers/google/cloud/hooks/dataflow.py from fix @terekete and putting next to my dag has been working for me as a short term work-around. (I had to make some small tweaks to get it from throwing any exceptions. See this gist

1reaction
TobKedcommented, Jan 5, 2021

I confirmed the issue by running system tests and made review for fix

Read more comments on GitHub >

github_iconTop Results From Across the Web

airflow.providers.google.cloud.operators.dataflow
This module contains Google Dataflow operators. ... Start a Templated Cloud Dataflow job. ... Starts flex templates with the Dataflow pipeline.
Read more >
DataflowStartFlexTemplateOpera...
DataflowStartFlexTemplateOperator. Google. Starts flex templates with the Dataflow pipeline. View on GitHub. Last Updated: Nov. 11, 2022 ...
Read more >
Use Flex Templates - Dataflow - Google Cloud
This tutorial shows you how to create and run a Dataflow Flex Template job with a custom Docker image using Google Cloud CLI....
Read more >
airflow/example_dataflow_flex_template.py at main - GitHub
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows - airflow/example_dataflow_flex_template.py at main ...
Read more >
Why you should be using Flex templates for your Dataflow ...
To troubleshoot the operator errors (wrong parameters passed) find the log entry that contains “Executing: java -cp /template/* <your main class> ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found