DAG Run fails when chaining multiple empty mapped tasks
See original GitHub issueApache Airflow version
2.3.3 (latest released)
What happened
On Kubernetes Executor and Local Executor (others not tested) a significant fraction of the DAG Runs of a DAG that has two consecutive mapped tasks which are are being passed an empty list are marked as failed when all tasks are either succeeding or being skipped.
What you think should happen instead
The DAG Run should be marked success.
How to reproduce
Run the following DAG on Kubernetes Executor or Local Executor.
The real world version of this DAG has several mapped tasks that all point to the same list, and that list is frequently empty. I have made a minimal reproducible example.
from datetime import datetime
from airflow import DAG
from airflow.decorators import task
with DAG(dag_id="break_mapping", start_date=datetime(2022, 3, 4)) as dag:
@task
def add_one(x: int):
return x + 1
@task
def say_hi():
print("Hi")
added_values = add_one.expand(x=[])
added_more_values = add_one.expand(x=[])
say_hi() >> added_values
added_values >> added_more_values
Operating System
Debian Bullseye
Versions of Apache Airflow Providers
apache-airflow-providers-amazon==1!4.0.0
apache-airflow-providers-cncf-kubernetes==1!4.1.0
apache-airflow-providers-elasticsearch==1!4.0.0
apache-airflow-providers-ftp==1!3.0.0
apache-airflow-providers-google==1!8.1.0
apache-airflow-providers-http==1!3.0.0
apache-airflow-providers-imap==1!3.0.0
apache-airflow-providers-microsoft-azure==1!4.0.0
apache-airflow-providers-mysql==1!3.0.0
apache-airflow-providers-postgres==1!5.0.0
apache-airflow-providers-redis==1!3.0.0
apache-airflow-providers-slack==1!5.0.0
apache-airflow-providers-sqlite==1!3.0.0
apache-airflow-providers-ssh==1!3.0.0
Deployment
Astronomer
Deployment details
Local was tested on docker compose (from astro-cli)
Anything else
No response
Are you willing to submit PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project’s Code of Conduct
Issue Analytics
- State:
- Created a year ago
- Reactions:6
- Comments:22 (15 by maintainers)
Top Results From Across the Web
airflow dynamic task mapping with multiple chained expand ...
A workaround consist in moving your task2 and task3 into a separate dag and running them by expanding a TriggerDagRunOperator task in the ......
Read more >DAGs — Airflow Documentation
DAGs¶. A DAG (Directed Acyclic Graph) is the core concept of Airflow, collecting Tasks together, organized with dependencies and relationships to say how ......
Read more >Apache Airflow Tasks: The Ultimate Guide for 2022 - Hevo Data
When a DAG runs, it creates Upstream/Downstream instances for each of these Tasks, but they all have the same data interval. There may...
Read more >Airflow TaskGroups: All you need to know! - Marc Lamberti
Airflow TaskGroups provide a way to group your tasks and make your DAGs cleaner. Forget about SubDAGs and is discover TaskGroups!
Read more >Writing DAGs (workflows) | Cloud Composer - Google Cloud
Airflow tasks can fail for multiple reasons. To avoid failures of whole DAG runs, we recommend to enable task retries. Setting maximum retries...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
For anyone else experiencing this, there is a workaround to put a sleep between your two sets of mapped tasks.
@frankcash My workaround specifically needs an any operator downstream of a mapped task (that might get skipped), so In your example:
added_values >> added_more_values >> sleep_task