Retry for TaskGroup
See original GitHub issueDiscussed in https://github.com/apache/airflow/discussions/21333
<div type='discussions-op-text'>Originally posted by sartyukhov February 4, 2022
Description
Hello!
Previously, a SubDag was used to organize tasks into groups. Now you’ve introduced a TaskGroups to the world . It’s nice and very clever. But it has a one big disadvantage over the SubDag - it cant be repeated.
Use case/motivation
For example:
In a project I have two task (A >> B): A - collect data (PythonOperator) B - update material view in postgres (PostgresOperator)
‘A’ could collect only part of data and mark itself as failed (there is no “half-failed” status as I know). But task ‘B’ should run regardless of A`s result (trigger_rule=“all_done” for example) to update matview with part of data. In an ~ hour I would like to repeat that process (A >> B).
With SubDag I could do that:
- initiate SubDag with parameter retries=10
- add DummyTask ‘C’ with trigger_rule=“all_success”
- change flow to A >> B >> C and A >> C
and that’s it, C marks dag as failed and trigger it to retry.
But TaskGroup does not have retry parameter. I also can’t retry whole DAG, because it’s big. I also don’t want to update material view inside task ‘A’ because in that way I can’t do [A0, A1…An] >> B (update material view just once for several collects).
I hope it’s possible. Or maybe it could be done some other way. Thanks in advance.
Additional explanation on the use case (from #21333)
I have a specific use case where this feature would be useful. It is like:
There is a task to do one thing There a second task (which depends on the first one) that does another thing, if this one fails I’ll need to re-run the entire dag. I can’t do both processes in the same task due to some limitations (I work with different java drivers on each one) and retrying the same task doesn’t solve the problem because the result of this task will imply whether or not the first dag would need a re-execution. Clear the previous task(s) also isn’t good because it’ll cause an infinite loop until everything succeeds, which is not exactly good, at least for me I would need only some 3-5 retries until it keeps a failed state.
My workaround for this was creating a dag that will trigger this dag, so if the triggered dag state is failed it’ll re-execute the amount of times I set. However as you can see, it makes necessary the creation of 2 dags for solving the problem.
Related issues
No response
Are you willing to submit a PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project’s Code of Conduct
Issue Analytics
- State:
- Created 2 years ago
- Reactions:7
- Comments:10 (3 by maintainers)
I completely understand your opinion, but at the same time I still think that this feature would be very useful. In my case I would need this feature in a lot of pipelines so without it I’ll need to create a lot of dags just for triggering them (of course I can group some of them but would still require some “dag trigger” dags). I know that it is kind of a very specific use case/situation but definitely I’m not alone with this problem.
I think every developer can answer on this question himself. It depends on architecture of each dag…
Main question is how to decide when TaskGroup is failed?