DFE for downstream tasks
See original GitHub issueCurrent behavior
I’ve just had a quick look at DFE in master #2646 and it works really awesome!
That said I’ve noticed something that is a bit odd, if I have a mapped task that is upstream from a task that doesn’t take data from the mapped task it still does DFE.
Is there / will there be a way to disable DFE or run a reduce task explicitly?
So in a situation like:
with Flow("dummy_flow") as flow:
list_data = task_a()
task_b_slug = task_b.map(data=list_data).slug
task_run_me_once_after_task_b(upstream_tasks=[flow.get_tasks(slug=task_b_slug)])
task_run_me_once_after_task_b is run many times. This of course makes sense if it takes data from task_b, but if it jus sets a status after task_b has finished running of does clean-up then you’d want to be able to disable DFE.
Proposed behavior
If there is a task downstream from a DFE mapped task that takes no output from the mapped task, the downstream task would default to not being “DFE” mapped as well.
Example
In our use-cases we have clean-up functions that run at the end of mapped tasks that don’t take data from the mapped output to work and they need to run only once.
Issue Analytics
- State:
- Created 3 years ago
- Comments:5 (2 by maintainers)
I’ve re-written using the imperative API as in #2752 and using the DaskExecutor it all works as expected, thanks for all the help here - can’t wait for DFE to land!
Hi @jacques- yea see my comment https://github.com/PrefectHQ/prefect/issues/2752#issuecomment-642843760 on your use of
upstream_tasks
.The
LocalDaskExecutor
is known to rerun tasks when using mapping