Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Retry for TaskGroup

See original GitHub issue

Discussed in https://github.com/apache/airflow/discussions/21333

^{Originally posted by sartyukhov February 4, 2022}

Description

Hello!

Previously, a SubDag was used to organize tasks into groups. Now you’ve introduced a TaskGroups to the world . It’s nice and very clever. But it has a one big disadvantage over the SubDag - it cant be repeated.

Use case/motivation

For example:

In a project I have two task (A >> B): A - collect data (PythonOperator) B - update material view in postgres (PostgresOperator)

‘A’ could collect only part of data and mark itself as failed (there is no “half-failed” status as I know). But task ‘B’ should run regardless of A`s result (trigger_rule=“all_done” for example) to update matview with part of data. In an ~ hour I would like to repeat that process (A >> B).

With SubDag I could do that:

initiate SubDag with parameter retries=10
add DummyTask ‘C’ with trigger_rule=“all_success”
change flow to A >> B >> C and A >> C

and that’s it, C marks dag as failed and trigger it to retry.

But TaskGroup does not have retry parameter. I also can’t retry whole DAG, because it’s big. I also don’t want to update material view inside task ‘A’ because in that way I can’t do [A0, A1…An] >> B (update material view just once for several collects).

I hope it’s possible. Or maybe it could be done some other way. Thanks in advance.

Additional explanation on the use case (from #21333)

I have a specific use case where this feature would be useful. It is like:

There is a task to do one thing There a second task (which depends on the first one) that does another thing, if this one fails I’ll need to re-run the entire dag. I can’t do both processes in the same task due to some limitations (I work with different java drivers on each one) and retrying the same task doesn’t solve the problem because the result of this task will imply whether or not the first dag would need a re-execution. Clear the previous task(s) also isn’t good because it’ll cause an infinite loop until everything succeeds, which is not exactly good, at least for me I would need only some 3-5 retries until it keeps a failed state.

My workaround for this was creating a dag that will trigger this dag, so if the triggered dag state is failed it’ll re-execute the amount of times I set. However as you can see, it makes necessary the creation of 2 dags for solving the problem.

Related issues

No response

Are you willing to submit a PR?

Yes I am willing to submit a PR!

Code of Conduct

I agree to follow this project’s Code of Conduct

</div>

Issue Analytics

State:
Created 2 years ago
Reactions:7
Comments:10 (3 by maintainers)

Top GitHub Comments

3reactions

victorfuzarocommented, Apr 9, 2022

Hi,

Up to now I considered TaskGroup mostly as a “visual feature”. If we start adding add’l “subdag” features to it we are going to gravitate to a “subdag” concept, which is about to be dropped.

If I had a need to have a more complex logic in a group of tasks then I would use approach of DAGs triggering other DAGs via TriggerDagRunOperator.

Regards.

I completely understand your opinion, but at the same time I still think that this feature would be very useful. In my case I would need this feature in a lot of pipelines so without it I’ll need to create a lot of dags just for triggering them (of course I can group some of them but would still require some “dag trigger” dags). I know that it is kind of a very specific use case/situation but definitely I’m not alone with this problem.

3reactions

sartyukhovcommented, Feb 28, 2022

What if TakGroup contains of 20 tasks and one of them is HTTP that just retired on some trasist error. Do we really want to retry the whole task group because of it?

I think every developer can answer on this question himself. It depends on architecture of each dag…

Main question is how to decide when TaskGroup is failed?

last task in a group is failed
any task in a group is failed

Top Results From Across the Web

Retry for TaskGroup #21333 - apache/airflow - GitHub

Now you've introduced a TaskGroups to the world . It's nice and very clever. But it has a one big disadvantage over the...

Airflow retry of multiple tasks as a group - Stack Overflow

I have SparkKubernetesOperator >> SparkKubernetesSensor and sensor is the one retrying/failing but actually I want to retry operator task. – ...

Retry on task groups if one of the task fails. - Astronomer Forum

With airflow 2.0 update, I understand that task group is a substitute for subdags but is there a way to retry a task...

Configuring retry for committing approved task group changes

You can configure the task group approval process to automatically retry committing task group changes to the database. By configuring this setting, ...

Airflow TaskGroups: All you need to know! - Marc Lamberti

Airflow TaskGroups provide a way to group your tasks and make your DAGs cleaner ... You cannot retry an entire TaskGroup in one...