Adaptive.needs_cpu does not depend on number of tasks remaining
See original GitHub issueIssue description
We’re using distributed (with KubeCluster
) with client.map to schedule a lot of long-running tasks (right now we’re running a Fortran-based hydrological model).
We noticed that clusters don’t scale down when the number of tasks remaining falls below the number of workers until all tasks have completed.
I isolated the problem to Adaptive.needs_cpu()
. The current method does not check whether there are any pending tasks on the scheduler:
def needs_cpu(self):
"""
Check if the cluster is CPU constrained (too many tasks per core)
Notes
-----
Returns ``True`` if the occupancy per core is some factor larger
than ``startup_cost``.
"""
total_occupancy = self.scheduler.total_occupancy
total_cores = sum([ws.ncores for ws in self.scheduler.workers.values()])
if total_occupancy / (total_cores + 1e-9) > self.startup_cost * 2:
logger.info("CPU limit exceeded [%d occupancy / %d cores]",
total_occupancy, total_cores)
return True
else:
return False
This results in adapt.recommendations()
returning the error message Trying to scale up and down simultaneously
whenever there are fewer pending tasks than there are workers, as long as the average task time suggests that more cores are needed (independent of the number of pending tasks).
Proposed solution
I implemented a quick fix, by finding the total number of pending tasks and only recommending a “scale up” if the number of tasks exceeds the number of existing workers, in addition to the current criteria:
def needs_cpu(self):
"""
Check if the cluster is CPU constrained (too many tasks per core)
Notes
-----
Returns ``True`` if the occupancy per core is some factor larger
than ``startup_cost``.
"""
total_occupancy = self.scheduler.total_occupancy
total_cores = sum([ws.ncores for ws in self.scheduler.workers.values()])
if total_occupancy / (total_cores + 1e-9) > self.startup_cost * 2:
logger.info("CPU limit exceeded [%d occupancy / %d cores]",
total_occupancy, total_cores)
tasks_processing = sum((len(w.processing) for w in self.scheduler.workers.values()))
num_workers = len(self.scheduler.workers)
if tasks_processing > num_workers:
logger.info("pending tasks exceed number of workers [%d tasks / %d workers]",
tasks_processing, num_workers)
return True
return False
Pros
- Exhibits the desired behavior (we’re using this fix now by subclassing KubeCluster)
Cons
- May be a limited use case
- Increases overhead of
needs_cpu
. I tested this out on limited cases with between 800 - 100,000 tasks and found the current implementation usually takes ~ 30-40 µs, and the proposed implementation roughly doubles this. There may be faster ways of doing this, but I imagine this may be a critical problem with this implementation, so help would be appreciated in estimating tasks remaining more quickly!
Testable example
Requires some interactivity, but reliably re-produces the problem
In [1]: import dask.distributed as dd
In [2]: cluster = dd.LocalCluster()
In [3]: adaptive = cluster.adapt(minimum=0, maximum=10)
In [5]: adaptive
Out[5]: <distributed.deploy.adaptive.Adaptive at 0x1153b3668>
In [6]: def wait_a_while(i):
...: import time
...: import random
...: s = (random.random()) ** 6 * 60
...: time.sleep(s)
...:
...: return s
In [8]: client = dd.Client(cluster)
In [9]: f = client.map(wait_a_while, range(10))
In [10]: # wait for most futures to finish
In [17]: f
Out[17]:
[<Future: status: finished, type: float, key: wait_a_while-fdc644303e9be2c85edd9201261409af>,
<Future: status: finished, type: float, key: wait_a_while-97098da3920c7582be062b54ee78efe1>,
<Future: status: finished, type: float, key: wait_a_while-630e0e1fb8a0f8ede1140368de97ffce>,
<Future: status: pending, key: wait_a_while-09f09368b6e9555668ab3f82efad91dd>,
<Future: status: finished, type: float, key: wait_a_while-65d1c81d072269ab477d806d017302e2>,
<Future: status: finished, type: float, key: wait_a_while-ca96a3b8db585962fc8638066458a815>,
<Future: status: finished, type: float, key: wait_a_while-0a13c1a4f503a08e1edaf79dba3c94c5>,
<Future: status: finished, type: float, key: wait_a_while-549f788086c75f350390b4a6131ae6cb>,
<Future: status: pending, key: wait_a_while-17133623fc213adcb83f3b45e53839c9>,
<Future: status: pending, key: wait_a_while-41e284f91b0a2bb1c3a33394e51c97fc>]
In [18]: cluster._adaptive.recommendations()
Out[18]: {'status': 'error', 'msg': 'Trying to scale up and down simultaneously'}
Issue Analytics
- State:
- Created 5 years ago
- Comments:10 (10 by maintainers)
Top GitHub Comments
I personally don’t know. If someone wants to look though I would recommend starting here:
https://github.com/dask/distributed/blob/2acffc3172ec32e173547ee4c39a01b6c94e74a1/distributed/scheduler.py#L5209-L5260
Indeed, thank you for following up here @guillaumeeb!