question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

DAG Run fails when chaining multiple empty mapped tasks

See original GitHub issue

Apache Airflow version

2.3.3 (latest released)

What happened

On Kubernetes Executor and Local Executor (others not tested) a significant fraction of the DAG Runs of a DAG that has two consecutive mapped tasks which are are being passed an empty list are marked as failed when all tasks are either succeeding or being skipped.

image

What you think should happen instead

The DAG Run should be marked success.

How to reproduce

Run the following DAG on Kubernetes Executor or Local Executor.

The real world version of this DAG has several mapped tasks that all point to the same list, and that list is frequently empty. I have made a minimal reproducible example.

from datetime import datetime

from airflow import DAG
from airflow.decorators import task


with DAG(dag_id="break_mapping", start_date=datetime(2022, 3, 4)) as dag:

    @task
    def add_one(x: int):
        return x + 1

    @task
    def say_hi():
        print("Hi")


    added_values = add_one.expand(x=[])
    added_more_values = add_one.expand(x=[])
    say_hi() >> added_values
    added_values >> added_more_values

Operating System

Debian Bullseye

Versions of Apache Airflow Providers

apache-airflow-providers-amazon==1!4.0.0
apache-airflow-providers-cncf-kubernetes==1!4.1.0
apache-airflow-providers-elasticsearch==1!4.0.0
apache-airflow-providers-ftp==1!3.0.0
apache-airflow-providers-google==1!8.1.0
apache-airflow-providers-http==1!3.0.0
apache-airflow-providers-imap==1!3.0.0
apache-airflow-providers-microsoft-azure==1!4.0.0
apache-airflow-providers-mysql==1!3.0.0
apache-airflow-providers-postgres==1!5.0.0
apache-airflow-providers-redis==1!3.0.0
apache-airflow-providers-slack==1!5.0.0
apache-airflow-providers-sqlite==1!3.0.0
apache-airflow-providers-ssh==1!3.0.0

Deployment

Astronomer

Deployment details

Local was tested on docker compose (from astro-cli)

Anything else

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Issue Analytics

  • State:closed
  • Created a year ago
  • Reactions:6
  • Comments:22 (15 by maintainers)

github_iconTop GitHub Comments

6reactions
collinmcnultycommented, Aug 18, 2022

For anyone else experiencing this, there is a workaround to put a sleep between your two sets of mapped tasks.

from airflow import DAG
from airflow.decorators import task

with DAG(dag_id="break_mapping", start_date=datetime(2022, 3, 4)) as dag:

    @task
    def add_one(x: int):
        return x + 1

    @task
    def say_hi():
        print("Hi")

    @task(trigger_rule="all_done")
    def sleep_task():
        sleep(5)

    added_values = add_one.expand(x=[])
    added_more_values = add_one.expand(x=[])
    say_hi() >> added_values
    added_values >> sleep_task() >> added_more_values
1reaction
ashbcommented, Sep 1, 2022

@frankcash My workaround specifically needs an any operator downstream of a mapped task (that might get skipped), so In your example: added_values >> added_more_values >> sleep_task

Read more comments on GitHub >

github_iconTop Results From Across the Web

airflow dynamic task mapping with multiple chained expand ...
A workaround consist in moving your task2 and task3 into a separate dag and running them by expanding a TriggerDagRunOperator task in the ......
Read more >
DAGs — Airflow Documentation
DAGs¶. A DAG (Directed Acyclic Graph) is the core concept of Airflow, collecting Tasks together, organized with dependencies and relationships to say how ......
Read more >
Apache Airflow Tasks: The Ultimate Guide for 2022 - Hevo Data
When a DAG runs, it creates Upstream/Downstream instances for each of these Tasks, but they all have the same data interval. There may...
Read more >
Airflow TaskGroups: All you need to know! - Marc Lamberti
Airflow TaskGroups provide a way to group your tasks and make your DAGs cleaner. Forget about SubDAGs and is discover TaskGroups!
Read more >
Writing DAGs (workflows) | Cloud Composer - Google Cloud
Airflow tasks can fail for multiple reasons. To avoid failures of whole DAG runs, we recommend to enable task retries. Setting maximum retries...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found