Different behaviour in distributed vs single-machine scheduler: missing task
See original GitHub issueI’m adding extra tasks on the optimizing step to save some task results to disk. The extra tasks are run on the dask single-machine scheduler, while they are not run on the distributed scheduler. It might not be a bug, but the intended behavior. In that case, how should have I done it?
Thanks!
Minimal Complete Verifiable Example:
Example 1:
import dask
def optimize(dsk, keys):
print("Optimizing graph")
dsk = dask.utils.ensure_dict(dsk)
# Adding an extra task to the graph
dsk["extra_task"] = (print, "Running extra task")
return dsk
dask.config.set(delayed_optimize=optimize)
dask.delayed(print)("Running task").compute()
Running example 1 prints:
Optimizing graph
Running extra task
Running task
Example 2:
Start dask-scheduler and a dask-worker:
dask-scheduler & dask-worker localhost:8786 &
Add to example 1 the following:
import dask
import distributed
client = distributed.Client("localhost:8786")
def optimize(dsk, keys):
...
Example 2 prints “Optimizing graph” in the python process running the file, and “Running task” in the dask-worker process, but it doesn’t print “Running extra task”.
Environment:
- Dask and distributed versions: 2021.04.0
- Python version: 3.8.8
- Operating System: Ubuntu 20.04
- Install method (conda, pip, source): conda
New conda environment from environment-3.8.yaml. Cloned and “pip -e” installed dask and distributed, tags 2021.04.0.
Issue Analytics
- State:
- Created 2 years ago
- Comments:6 (3 by maintainers)
Top Results From Across the Web
Scheduling Tasks in a Distributed System - Level Up Coding
Scheduling is a decision-making process that is used on a regular basis in many manufacturing and services industries.
Read more >Task Scheduling in Distributed Systems - DiVA
Task allocation is the process of allocating tasks to the best suitable processors of the system while task scheduling is used to determine...
Read more >Overview of Scheduling Tasks in Distributed Computing ...
So scheduling is a decision making process about assigning which task will be executed by which resource.
Read more >How Project schedules tasks: Behind the scenes
How Project schedules tasks using: critical tasks, start time, effort-driven tasks, dependencies, constraints, task types, critical path, resource calendars ...
Read more >Scheduler tasks in cluster run always on one node
Question 1: We see on our 4-node cluster that all scheduled tasks (Any Server) run on same node. Can we change this behaviour?...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Yes if possible, optimizations such as culling and blockwise fusion will be done by the scheduler. Currently, this might not happen if low-level fusion is enabled but we are working on moving most, if not all, optimizations to the scheduler.
I don’t think plugging into the optimizers are an optimal choice here. Maybe introduce a function that parse
x.__dask_graph__()
and inject load/save instructions instead of relying on the optimize infrastructure?Do you mean something like:
I wanted it to be hidden from the user, as it could be misused as follows: