question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

TaskFlow AirflowSkipException causes downstream step to fail

See original GitHub issue

Apache Airflow version

2.3.2 (latest released)

What happened

Using TaskFlow API and have 2 tasks that lead to the same downstream task. These tasks check for new data and when found will set an XCom entry of the new filename for the downstream to handle. If no data is found the upstream tasks raise a skip exception. The downstream task has the trigger_rule = none_failed_min_one_success.

Problem is that a task which is set to Skip doesn’t set any XCom. When the downstream task starts it raises the error: airflow.exceptions.AirflowException: XComArg result from task2 at airflow_2_3_xcomarg_render_error with key="return_value" is not found!

What you think should happen instead

Based on trigger rule of “none_failed_min_one_success”, expectation is that an upstream task should be allowed to skip and the downstream task will still run. While the downstream does try to start based on trigger rules, it never really gets to run since the error is raised when rendering the arguments.

How to reproduce

Example dag will generate the error if run.

from airflow.decorators import dag, task
from airflow.exceptions import AirflowSkipException

@task
def task1():
    return "example.csv"

@task
def task2():
    raise AirflowSkipException()

@task(trigger_rule="none_failed_min_one_success")
def downstream_task(t1, t2):
    print("task ran")

@dag(
    default_args={"owner": "Airflow", "start_date": "2022-06-07"},
    schedule_interval=None,
)
def airflow_2_3_xcomarg_render_error():
    t1 = task1()
    t2 = task2()
    downstream_task(t1, t2)

example_dag = airflow_2_3_xcomarg_render_error()

Operating System

Ubuntu 20.04.4 LTS

Versions of Apache Airflow Providers

No response

Deployment

Virtualenv installation

Deployment details

No response

Anything else

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Issue Analytics

  • State:closed
  • Created a year ago
  • Reactions:3
  • Comments:9 (6 by maintainers)

github_iconTop GitHub Comments

4reactions
markhatchcommented, Jul 15, 2022

Unsure if helpful - but tossing in my vote for this as well and thought to share my use case.

Expected trigger_rule would be respected rather than automatically failing downstream tasks. I have downstream task that pick a random choice from any successful upstream.

    @task(trigger_rule=TriggerRule.ALL_DONE)
    def choose_cluster_to_run_on(acceptable_systems):

Screen Shot 2022-07-15 at 12 18 03 PM

In the example above - would expect the choose_cluster task to pick/pass cluster_b. Instead throws key="return_value" is not found as mentioned.

2reactions
jordanjeremycommented, Jul 5, 2022

@ashb Previously this did return None. However, I do see that this could be indeterminant for the cases where XCom value could have been set to None or it may not have been set. Would returning the NOTSET object instead of raising an error when it is seen work better? If that is done, then the difference between XCom is None or XCom wasn’t set, could be determined in the event that someone needed to be able to tell the difference between those cases.

Edit: I think the most important runtime behavior is that the XComArg.resolve method returns a value instead of raising an error. When raising an error the task won’t run, even if by the trigger rules meant that it tried to run. The exact value returned, None or some sentinel value (NOTSET) that someone can check for should work, in my opinion.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Airflow : Run a task when some upstream is skipped by ...
This operator makes sure no downstream task that rely on multiple paths are getting skipped because of one skipped task.
Read more >
Tasks — Airflow Documentation
A Task is the basic unit of execution in Airflow. Tasks are arranged into DAGs, and then have upstream and downstream dependencies set...
Read more >
5 Complex task dependencies - Data Pipelines with Apache ...
An upstream failure stops downstream tasks from being executed with the default trigger rule 'all_success', which requires all upstream tasks to be successful....
Read more >
Apache Airflow Tasks: The Ultimate Guide for 2022 - Hevo Data
Apache Airflow is a popular open-source workflow management tool. ... Tasks are organized into DAGs, and upstream and downstream ...
Read more >
FAQ: Is it possible to fail taskflow when any of the Data task ...
In that pipeline add a Throw step. 4. Specify the Code, Details and reason for failure and Taskflow will fail with the same...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found