question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Scheduler behaves badly when adaptively adding workers to meet resource demand

See original GitHub issue

There is an issue with how the scheduler assigns tasks from the unrannable queue to workers who meet the resource requirements joining the scheduler.

The use case is some long running complex task where some tasks require an expensive resource, say GPUs, but those resources are only provided (through Adaptive) once the tasks requiring those resources are ready to be run. Say we come to a point in the computation where 5 tasks could be run, if GPUs were available. Managing the scale_up behaviour through adaptive is fairly straightforward and allows adding new compute nodes (for instance on AWS) with the required resources. The problem appears when the first of the new workers connects.

Scheduler.addWorker will go through the list of unrunnable tasks and check if there are workers that meet the requirements, since only the first worker has thus far connected there is only one worker that meets the requirements (some possibly positive number of workers may or may not booting up and joining shortly, but that hasn’t happened yet).

        for ts in list(self.unrunnable):
            valid = self.valid_workers(ts)
            if valid is True or ws in valid:
                recommendations[ts.key] = 'waiting'

The task goes through released -> waiting and then waiting -> processing, the transition_waiting_processing again calls valid_workers to get a list of workers where the task(s) can be run (this list still just contains a single worker because the other ones haven’t yet connected).

The end result of all of this is that the worker who happens connect first and have the resource required by the tasks gets all the tasks dumped onto it with all the other workers, who potentially connect just seconds later, get nothing and are shutdown by the scheduler because they are idling.

In short, it appears to be the case that the purpose of the resource_requirements is to act as a hint of required peak performance (memory, GPU, whatever) from the workers, and not to be a dynamically changing resource allocation. Is this the case, and is there any interested in changing that? The resources available and resources consumed are taken into account in transition_waiting_processing, but only on the worker_state not for the scheduler in general.

If this is not the intended behaviour and should be fixed, I’m more than happy to work on this.

Issue Analytics

  • State:open
  • Created 5 years ago
  • Comments:29 (24 by maintainers)

github_iconTop GitHub Comments

1reaction
calebhocommented, May 29, 2019

I’m seeing similar behavior with my workload where all tasks are going to a single worker because other workers haven’t come online yet. This thread is more than a year old, so I’m not sure what’s changed since the beginning of this thread. Anyways, here are my findings w.r.t. worker stealing not working properly with resources.

From what I can tell, stealing logic is implemented in stealing.py with a SchedulerPlugin. Every time a worker is added, state is updated to keep track of what tasks can be stolen from it. In particular, this is the stealable instance attribute. Every time a task transitions, this plugin checks to see whether the task is transitioning to "processing" and calls put_key_in_stealable passing in the task state.

Inside of put_key_in_stealable, the cost of moving the task is computed using steal_time_ratio. The first check in steal_time_ratio is checking whether the task has hard restrictions and whether any of the restrictions are set:

if not ts.loose_restrictions and (
    ts.host_restrictions or ts.worker_restrictions or ts.resource_restrictions
):
    return None, None  # don't steal

Back in put_key_in_stealable, nothing happens if the returned cost is None:

def put_key_in_stealable(self, ts):
    ws = ts.processing_on
    worker = ws.address
    cost_multiplier, level = self.steal_time_ratio(ts)
    self.log.append(("add-stealable", ts.key, worker, level))
    if cost_multiplier is not None:
        self.stealable_all[level].add(ts)
        self.stealable[worker][level].add(ts)
        self.key_stealable[ts] = (worker, level)

This would seem to explain why no work stealing occurs when tasks are marked with resources. Any comments on the analysis here?

Moving on to a possible fix, I think it makes sense to remove the restrictions check mentioned above and add some checks before the call to maybe_move_task in balance to ensure that the thief has the resources required to steal the task. Thoughts?

1reaction
lestevecommented, Feb 11, 2019

sorry I missed the .scale in @leej3’s post, so I guess this is the same problem as .adapt. Here is a slightly simpler snippet to reproduce the problem:

import time
import os
import threading
import pprint
import webbrowser

from dask.distributed import Client, LocalCluster


def do_work(task_time=0.5):
    time.sleep(task_time)
    return (os.getpid(), threading.current_thread().ident)


cluster = LocalCluster(n_workers=1, threads_per_worker=3, resources={'foo': 1})
client = Client(cluster)
dashboard_port = client.scheduler_info()['services']['bokeh']
print('dashboard:', dashboard_port)
# uncomment next two lines if you want to open the dashboard. sleep 5s to give
# time to the tab to load
# webbrowser.open(f'http://localhost:{dashboard_port}/status')
# time.sleep(5)
t0 = time.time()
futures = [client.submit(do_work, pure=False, resources={'foo': 1})
           for i in range(20)]
cluster.scale(2)
output = client.gather(futures)
pprint.pprint(output)
print(time.time() - t0)

Output:

[(11660, 139795962357504),
 (11660, 139795962357504),
 (11660, 139795481229056),
 (11660, 139795962357504),
 (11660, 139795481229056),
 (11660, 139795472836352),
 (11660, 139795481229056),
 (11660, 139795962357504),
 (11660, 139795481229056),
 (11660, 139795472836352),
 (11660, 139795481229056),
 (11660, 139795962357504),
 (11660, 139795481229056),
 (11660, 139795472836352),
 (11660, 139795481229056),
 (11660, 139795962357504),
 (11660, 139795481229056),
 (11660, 139795472836352),
 (11660, 139795481229056),
 (11660, 139795962357504)]
10.067777633666992

Summary of the issue:

  • submit some tasks using resources
  • scale up your cluster
  • workers that are added after the task submissions are never used to do some tasks (bug only exists when using resources)
Read more comments on GitHub >

github_iconTop Results From Across the Web

Work stealing in case of job restrictions / resources would be ...
Currently jobs that have host or resource restrictions are not considered ... Scheduler behaves badly when adaptively adding workers to meet ...
Read more >
Scheduling: The Multi-Level Feedback Queue - cs.wisc.edu
In this chapter, we'll tackle the problem of developing one of the most well-known approaches to scheduling, known as the Multi-level Feed-.
Read more >
(PDF) Integrating planning and scheduling through adaptation ...
We describe an incremental and adaptive approach to integrating hierar-chical task network planning and constraint-based scheduling.
Read more >
Enhancing the Effectiveness of Work Groups and Teams
In summary, we see team learning as another team-level cognitive resource that has promise to help in understanding how team members are able...
Read more >
Over-allocation of resources bad for you? - Saviom
While scheduling projects, if you over-allocate a resource against their availability, it will lead to employee burnouts and potential ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found