question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

First class Sub-flow concept

See original GitHub issue

Archived from the Prefect Public Slack Community

walter_gillett: Hi - we are building bioinformatics pipelines related to infectious disease. Prefect looks interesting. I am wondering about task grouping (a.k.a. nesting or sub-dags). Each step in our pipeline reads inputs from GCS and writes outputs to GCS. Without task grouping, this will get messy. For example, suppose we have steps 1, 2, and 3, each of which reads one GCS input and writes a GCS output. That yields 9 tasks (3 GCS download, 3 compute, and 3 upload), but we would like to group them into pipeline steps because that’s the essential unit of work. Is there a way to model this in Prefect?

chris: Hi <@UQM4X5RE2>! Apologies if I’m misunderstanding the use case, but it sounds like you only need 3 Prefect Tasks? What is the benefit you hope to achieve by “grouping” tasks without them being realized as true Prefect Tasks?

walter_gillett: Hi <@ULRBLQ19A> - likely I am misunderstanding how Prefect works. Yes, I want only 3 Prefect Tasks. But if I want to use Prefect machinery to conveniently download from GCS, that’s a task (prefect.tasks.google.storage.GCSDownload), same for upload, so I get 9 Tasks, yes? Conceptually there are 3 pipeline steps so I would like the workflow structure to reflect that. I am thinking of this as being like SubDAGs in Airflow (https://www.astronomer.io/guides/subdags/), where aggregating low-level details makes it possible to have a workflow with a higher level of granularity.

walter_gillett: I see related discussion here: https://docs.prefect.io/core/PINs/PIN-05-Combining-Tasks.html and https://github.com/PrefectHQ/prefect/issues/980 . But not sure what the recommendation coming out of that is.

chris: Yea, I think I understand better what you’re referring to now - thanks for that link; correct me if I’m wrong here, but the airflow notion of SubDAG is an API convenience in the UI for seeing task groupings, which makes sense. I don’t think I see any functional difference in the way the DAG behaves between the fully expanded representation and the SubDAG representation.

In Prefect, you can certainly create multiple flows and then link them together using some combination of flow.update / flow.set_dependencies / flow.root_tasks() / flow.terminal_tasks() but ultimately we haven’t yet exposed an analogous first-class “sub Flow” concept

walter_gillett: Thanks <@ULRBLQ19A> good to know, rolling up flows could be the answer for now. Adding a first-class subflow concept to Prefect would be helpful, but nesting adds complexity so would have to be done carefully - more is not always better. As a side note re Airflow SubDAGs from the article I linked to “Astronomer highly recommends staying away from SubDags. Airflow 1.10 has changed the default SubDag execution method to use the Sequential Executor to work around deadlocks caused by SubDags”.

chris: very interesting; yea I agree this seems like a really convenient abstraction - we’ll definitely look into it! I’ll actually use our bot to archive this thread as a GitHub issue that we can use to track it

chris: <@ULVA73B9P> archive “First class Sub-flow concept”

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Reactions:9
  • Comments:13 (4 by maintainers)

github_iconTop GitHub Comments

7reactions
Abrosimov-a-acommented, Sep 30, 2020

What about decorator concept like that?

import prefect
from prefect import task, Flow, Parameter
from time import sleep

## DECORATOR
def flow(*flow_args, **flow_kwargs):
    """Flow and SubFlow decorator."""
    def decorator(func):
        def wrapper(*args, **kwargs):
            if args or kwargs:
                return func(*args, **kwargs)
            else:
                with Flow(*flow_args, **flow_kwargs) as flow_instance:
                    func()
                return flow_instance
        return wrapper
    return decorator


## EXAMPLE
@task
def get_sleep_list(start, stop, step):
    return list(range(start, stop, step))


@task
def sleeping(x):
    sleep(x)
    return x


@task
def log(lst):
    logger = prefect.context.get('logger')
    result = '\n'.join(['  Task sleeping(x={})'.format(i)
                        for i in lst])
    logger.info('Task results!\n%s', result)


@flow('List generator')
def flow_gen(start=Parameter('start', default=3),
             stop=Parameter('stop', default=12),
             step=Parameter('step', default=3)):
    sleep_list = get_sleep_list(start, stop, step)
    return sleep_list


@flow('Sleeping')
def flow_sleep(lst=Parameter('lst', default=[2, 4, 6])):
    result = sleeping.map(lst)
    return result


def flow_main():
    with Flow('List sleeping') as flow_inst:
        start = Parameter('start', default=2)
        stop = Parameter('stop', default=8)
        step = Parameter('step', default=2)

        sleep_list = flow_gen(start, stop, step)
        result = flow_sleep(sleep_list)

        log(result)
    return flow_inst


if __name__ == '__main__':
    flow_gen().register(project_name='tests')
    flow_sleep().register(project_name='tests')
    flow_main().register(project_name='tests')
4reactions
peterth3commented, Jun 2, 2020

Thanks for the response @lauralorenz and for sharing the miro board! Task design vs flow design pros/cons on page 3 is really interesting. I’d like to learn more about why some of these pros/cons are there. To address your question, the design I have in mind is a kind of mix where the flow object is callable like a task. Something like

from .flows.common import a_flow, b_flow
from .tasks.specialized import special_task

with Flow('ab-flow') as ab_flow:
    a, b = Parameter('a'), Parameter('b')
    a_flow_task = a_flow(a)
    b_flow_task = b_flow(b)
    special_task(upstream_tasks=[a_flow_task, b_flow_task])

Where ab_flow(a, b) can also be called in another flow. I think this design would make it easier to modularize flows and keep tasks tiny. I feel like it will also be more explicit what code is run when this way vs calling a task from another task. I’m not sure what the pros/cons are with this approach or whether it’s technically feasible. What do you think?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Building subflows - ServiceNow Docs
Define a sequence of reusable actions that can be started from a flow, subflow, or script. Define inputs and outputs to pass data...
Read more >
Flow: How To Use "Subflow" - Salesforce Flowsome!
The concept of subflow should be easy to understand – when you want to launch a flow inside another flow (main flow), it...
Read more >
Subflow - definition - Pega Community
A subflow, also called a subprocess, is a flow rule ( Rule-Obj-Flow class) that is referenced in ... The second flow is called...
Read more >
NOWCommunity Live Stream - Flow Designer Subflows
Check out this video to find out how Flow Designer's subflows can be used as a collection of powerful , reusable tools called...
Read more >
Using Subflows in Spring Integration - Baeldung
Without subflows, the simple idea is to define three separate integration flows, one for each type of number. We'll send the same sequence...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found