question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

LocalDaskExecutor(scheduler='threads') not running concurrently

See original GitHub issue

Description

I suppose LocalDaskExecutor(scheduler='threads') should be running concurrently. However this does not seem true according to my example below.

Not sure why it is the case, probably the task starts when compute is called in the wait? https://github.com/PrefectHQ/prefect/blob/master/src/prefect/engine/executors/dask.py#L287

Reproduction

import prefect
from prefect import task, Flow
from prefect.engine.executors import LocalDaskExecutor

@task
def task_sleep(seconds):
    import time
    print("start sleeping {}".format(seconds))
    time.sleep(seconds)
    print("end sleeping {}".format(seconds))
    return seconds

with Flow("dummy_sleep") as flow:
    task_sleep(seconds=10)
    task_sleep(seconds=12)

state = flow.run(executor=LocalDaskExecutor(scheduler='threads'))

which produces

[2020-03-07 05:34:35,250] INFO - prefect.FlowRunner | Beginning Flow run for 'dummy_sleep'
[2020-03-07 05:34:35,253] INFO - prefect.FlowRunner | Starting flow run.
[2020-03-07 05:34:35,301] INFO - prefect.TaskRunner | Task 'task_sleep': Starting task run...
start sleeping 10
end sleeping 10
[2020-03-07 05:34:45,316] INFO - prefect.TaskRunner | Task 'task_sleep': finished task run for task with final state: 'Success'
[2020-03-07 05:34:45,325] INFO - prefect.TaskRunner | Task 'task_sleep': Starting task run...
start sleeping 12
end sleeping 12
[2020-03-07 05:34:57,342] INFO - prefect.TaskRunner | Task 'task_sleep': finished task run for task with final state: 'Success'
[2020-03-07 05:34:57,343] INFO - prefect.FlowRunner | Flow run SUCCESS: all reference tasks succeeded

Environment

prefect diagnostics
{
  "config_overrides": {},
  "env_vars": [],
  "system_information": {
    "platform": "Linux-5.3.0-1011-gcp-x86_64-with-Ubuntu-19.10-eoan",
    "prefect_version": "0.9.7",
    "python_version": "3.7.5"
  }
}

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:9 (1 by maintainers)

github_iconTop GitHub Comments

1reaction
Marlin-Nacommented, Apr 16, 2020

@jcrist Thanks! This looks great.

1reaction
jlowincommented, Mar 7, 2020

This is a great callout, thank you. I believe what’s happening is that we implicitly wait on each terminal task in sequence, evaluating all of its upstream tasks concurrently. However, a task only gets evaluated when a terminal task downstream of it is computed. Therefore, in your flow, the two terminal tasks are evaluated in sequence.

Below, I can get concurrency as expected by adding a single (dummy) terminal task that comes after both tasks.

We should think of ways to ensure the entire graph is computed at once (possibly automatically adding dummy nodes like the one I’m adding here at runtime) - cc @cicdw


import prefect
from prefect import task, Flow
from prefect.engine.executors import LocalDaskExecutor

@task
def task_sleep(seconds):
    import time
    print("start sleeping {}".format(seconds))
    time.sleep(seconds)
    print("end sleeping {}".format(seconds))
    return seconds

with Flow("dummy_sleep") as flow:
    t1 = task_sleep(seconds=10)
    t2 = task_sleep(seconds=12)

    t3 = prefect.Task()
    t3.set_upstream(t1)
    t3.set_upstream(t2)

state = flow.run(executor=LocalDaskExecutor(scheduler='threads'))

Logs:


[2020-03-07 19:46:15,142] INFO - prefect.FlowRunner | Beginning Flow run for 'dummy_sleep'
[2020-03-07 19:46:15,144] INFO - prefect.FlowRunner | Starting flow run.
[2020-03-07 19:46:15,235] INFO - prefect.TaskRunner | Task 'task_sleep': Starting task run...
start sleeping 10
[2020-03-07 19:46:15,240] INFO - prefect.TaskRunner | Task 'task_sleep': Starting task run...
start sleeping 12
end sleeping 10
[2020-03-07 19:46:25,254] INFO - prefect.TaskRunner | Task 'task_sleep': finished task run for task with final state: 'Success'
end sleeping 12
[2020-03-07 19:46:27,248] INFO - prefect.TaskRunner | Task 'task_sleep': finished task run for task with final state: 'Success'
[2020-03-07 19:46:27,254] INFO - prefect.TaskRunner | Task 'Task': Starting task run...
[2020-03-07 19:46:27,257] INFO - prefect.TaskRunner | Task 'Task': finished task run for task with final state: 'Success'
[2020-03-07 19:46:27,258] INFO - prefect.FlowRunner | Flow run SUCCESS: all reference tasks succeeded
Read more comments on GitHub >

github_iconTop Results From Across the Web

LocalDaskExecutor multiprocessing regression due to dask ...
When using LocalDaskExecutor , tasks that can be run in parallel are not run in parallel if the installed dask-core library is 2021.4.0...
Read more >
Executors - Prefect Docs
It can run tasks in parallel using either threads (default) or local processes using one of Dask's local schedulers . from prefect.executors import ......
Read more >
Parallel Prefect: A bit about Dask, real fast - Medium
All tasks are executed in a single thread, parallelism is not supported. LocalDaskExecutor : an executor that runs on dask primitives with a ......
Read more >
How can threads of execution be running concurrently when ...
They are not always running concurrently, the scheduler's job is to swap the running threads around so that they appear to be running...
Read more >
Scheduling - Dask documentation
It can run locally or distributed across a cluster ... import dask dask.config.set(scheduler='threads') # overwrite default with threaded scheduler.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found