question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

DFE for downstream tasks

See original GitHub issue

Current behavior

I’ve just had a quick look at DFE in master #2646 and it works really awesome!

That said I’ve noticed something that is a bit odd, if I have a mapped task that is upstream from a task that doesn’t take data from the mapped task it still does DFE.

Is there / will there be a way to disable DFE or run a reduce task explicitly?

So in a situation like:

with Flow("dummy_flow") as flow:
    list_data = task_a()
    task_b_slug = task_b.map(data=list_data).slug
    task_run_me_once_after_task_b(upstream_tasks=[flow.get_tasks(slug=task_b_slug)])

task_run_me_once_after_task_b is run many times. This of course makes sense if it takes data from task_b, but if it jus sets a status after task_b has finished running of does clean-up then you’d want to be able to disable DFE.

Proposed behavior

If there is a task downstream from a DFE mapped task that takes no output from the mapped task, the downstream task would default to not being “DFE” mapped as well.

Example

In our use-cases we have clean-up functions that run at the end of mapped tasks that don’t take data from the mapped output to work and they need to run only once.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:5 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
jacques-commented, Jun 15, 2020

I’ve re-written using the imperative API as in #2752 and using the DaskExecutor it all works as expected, thanks for all the help here - can’t wait for DFE to land!

0reactions
cicdwcommented, Jun 11, 2020

Hi @jacques- yea see my comment https://github.com/PrefectHQ/prefect/issues/2752#issuecomment-642843760 on your use of upstream_tasks.

The LocalDaskExecutor is known to rerun tasks when using mapping

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to fine-tune a model for common downstream tasks
This guide will show you how to fine-tune 🤗 Transformers models for common downstream tasks. You will use the 🤗 Datasets library to...
Read more >
How can I trigger downstream tasks based on upstream task's ...
The example below shows how certain tasks can be skipped based on previous task's state. import random from prefect import task, flow @task...
Read more >
How to set up a DAG when downstream task definitions ...
Here's what I think I will actually do: run two separate DAGs at different times in the day without explicitly declaring the task...
Read more >
Tasks — Airflow Documentation
A Task is the basic unit of execution in Airflow. Tasks are arranged into DAGs, and then have upstream and downstream dependencies set...
Read more >
Which tasks are called as downstream tasks?
In the context of self-supervised learning (which is also used in NLP), a downstream task is the task that you actually want to...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found