LocalDaskExecutor(scheduler='threads') not running concurrently
See original GitHub issueDescription
I suppose LocalDaskExecutor(scheduler='threads')
should be running concurrently. However this does not seem true according to my example below.
Not sure why it is the case, probably the task starts when compute
is called in the wait
? https://github.com/PrefectHQ/prefect/blob/master/src/prefect/engine/executors/dask.py#L287
Reproduction
import prefect
from prefect import task, Flow
from prefect.engine.executors import LocalDaskExecutor
@task
def task_sleep(seconds):
import time
print("start sleeping {}".format(seconds))
time.sleep(seconds)
print("end sleeping {}".format(seconds))
return seconds
with Flow("dummy_sleep") as flow:
task_sleep(seconds=10)
task_sleep(seconds=12)
state = flow.run(executor=LocalDaskExecutor(scheduler='threads'))
which produces
[2020-03-07 05:34:35,250] INFO - prefect.FlowRunner | Beginning Flow run for 'dummy_sleep'
[2020-03-07 05:34:35,253] INFO - prefect.FlowRunner | Starting flow run.
[2020-03-07 05:34:35,301] INFO - prefect.TaskRunner | Task 'task_sleep': Starting task run...
start sleeping 10
end sleeping 10
[2020-03-07 05:34:45,316] INFO - prefect.TaskRunner | Task 'task_sleep': finished task run for task with final state: 'Success'
[2020-03-07 05:34:45,325] INFO - prefect.TaskRunner | Task 'task_sleep': Starting task run...
start sleeping 12
end sleeping 12
[2020-03-07 05:34:57,342] INFO - prefect.TaskRunner | Task 'task_sleep': finished task run for task with final state: 'Success'
[2020-03-07 05:34:57,343] INFO - prefect.FlowRunner | Flow run SUCCESS: all reference tasks succeeded
Environment
prefect diagnostics
{
"config_overrides": {},
"env_vars": [],
"system_information": {
"platform": "Linux-5.3.0-1011-gcp-x86_64-with-Ubuntu-19.10-eoan",
"prefect_version": "0.9.7",
"python_version": "3.7.5"
}
}
Issue Analytics
- State:
- Created 4 years ago
- Comments:9 (1 by maintainers)
Top Results From Across the Web
LocalDaskExecutor multiprocessing regression due to dask ...
When using LocalDaskExecutor , tasks that can be run in parallel are not run in parallel if the installed dask-core library is 2021.4.0...
Read more >Executors - Prefect Docs
It can run tasks in parallel using either threads (default) or local processes using one of Dask's local schedulers . from prefect.executors import ......
Read more >Parallel Prefect: A bit about Dask, real fast - Medium
All tasks are executed in a single thread, parallelism is not supported. LocalDaskExecutor : an executor that runs on dask primitives with a ......
Read more >How can threads of execution be running concurrently when ...
They are not always running concurrently, the scheduler's job is to swap the running threads around so that they appear to be running...
Read more >Scheduling - Dask documentation
It can run locally or distributed across a cluster ... import dask dask.config.set(scheduler='threads') # overwrite default with threaded scheduler.
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
@jcrist Thanks! This looks great.
This is a great callout, thank you. I believe what’s happening is that we implicitly
wait
on each terminal task in sequence, evaluating all of its upstream tasks concurrently. However, a task only gets evaluated when a terminal task downstream of it is computed. Therefore, in your flow, the two terminal tasks are evaluated in sequence.Below, I can get concurrency as expected by adding a single (dummy) terminal task that comes after both tasks.
We should think of ways to ensure the entire graph is computed at once (possibly automatically adding dummy nodes like the one I’m adding here at runtime) - cc @cicdw
Logs: