Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Adaptive.needs_cpu does not depend on number of tasks remaining

See original GitHub issue

Issue description

We’re using distributed (with KubeCluster) with client.map to schedule a lot of long-running tasks (right now we’re running a Fortran-based hydrological model).

We noticed that clusters don’t scale down when the number of tasks remaining falls below the number of workers until all tasks have completed.

I isolated the problem to Adaptive.needs_cpu(). The current method does not check whether there are any pending tasks on the scheduler:

    def needs_cpu(self):
        """
        Check if the cluster is CPU constrained (too many tasks per core)
        Notes
        -----
        Returns ``True`` if the occupancy per core is some factor larger
        than ``startup_cost``.
        """
        total_occupancy = self.scheduler.total_occupancy
        total_cores = sum([ws.ncores for ws in self.scheduler.workers.values()])

        if total_occupancy / (total_cores + 1e-9) > self.startup_cost * 2:
            logger.info("CPU limit exceeded [%d occupancy / %d cores]",
                        total_occupancy, total_cores)
            return True
        else:
            return False

This results in adapt.recommendations() returning the error message Trying to scale up and down simultaneously whenever there are fewer pending tasks than there are workers, as long as the average task time suggests that more cores are needed (independent of the number of pending tasks).

Proposed solution

I implemented a quick fix, by finding the total number of pending tasks and only recommending a “scale up” if the number of tasks exceeds the number of existing workers, in addition to the current criteria:

    def needs_cpu(self):
        """
        Check if the cluster is CPU constrained (too many tasks per core)
        Notes
        -----
        Returns ``True`` if the occupancy per core is some factor larger
        than ``startup_cost``.
        """
        total_occupancy = self.scheduler.total_occupancy
        total_cores = sum([ws.ncores for ws in self.scheduler.workers.values()])

        if total_occupancy / (total_cores + 1e-9) > self.startup_cost * 2:
            logger.info("CPU limit exceeded [%d occupancy / %d cores]",
                        total_occupancy, total_cores)

            tasks_processing = sum((len(w.processing) for w in self.scheduler.workers.values()))
            num_workers = len(self.scheduler.workers)

            if tasks_processing > num_workers:
                logger.info("pending tasks exceed number of workers [%d tasks / %d workers]",
                            tasks_processing, num_workers)
                return True

        return False

Pros

Exhibits the desired behavior (we’re using this fix now by subclassing KubeCluster)

Cons

May be a limited use case
Increases overhead of needs_cpu. I tested this out on limited cases with between 800 - 100,000 tasks and found the current implementation usually takes ~ 30-40 µs, and the proposed implementation roughly doubles this. There may be faster ways of doing this, but I imagine this may be a critical problem with this implementation, so help would be appreciated in estimating tasks remaining more quickly!

Testable example

Requires some interactivity, but reliably re-produces the problem

In [1]: import dask.distributed as dd

In [2]: cluster = dd.LocalCluster()

In [3]: adaptive = cluster.adapt(minimum=0, maximum=10)

In [5]: adaptive
Out[5]: <distributed.deploy.adaptive.Adaptive at 0x1153b3668>

In [6]: def wait_a_while(i):
   ...:     import time
   ...:     import random
   ...:     s = (random.random()) ** 6 * 60
   ...:     time.sleep(s)
   ...:
   ...:     return s

In [8]: client = dd.Client(cluster)

In [9]: f = client.map(wait_a_while, range(10))

In [10]: # wait for most futures to finish

In [17]: f
Out[17]:
[<Future: status: finished, type: float, key: wait_a_while-fdc644303e9be2c85edd9201261409af>,
 <Future: status: finished, type: float, key: wait_a_while-97098da3920c7582be062b54ee78efe1>,
 <Future: status: finished, type: float, key: wait_a_while-630e0e1fb8a0f8ede1140368de97ffce>,
 <Future: status: pending, key: wait_a_while-09f09368b6e9555668ab3f82efad91dd>,
 <Future: status: finished, type: float, key: wait_a_while-65d1c81d072269ab477d806d017302e2>,
 <Future: status: finished, type: float, key: wait_a_while-ca96a3b8db585962fc8638066458a815>,
 <Future: status: finished, type: float, key: wait_a_while-0a13c1a4f503a08e1edaf79dba3c94c5>,
 <Future: status: finished, type: float, key: wait_a_while-549f788086c75f350390b4a6131ae6cb>,
 <Future: status: pending, key: wait_a_while-17133623fc213adcb83f3b45e53839c9>,
 <Future: status: pending, key: wait_a_while-41e284f91b0a2bb1c3a33394e51c97fc>]

In [18]: cluster._adaptive.recommendations()
Out[18]: {'status': 'error', 'msg': 'Trying to scale up and down simultaneously'}

Issue Analytics

State:
Created 5 years ago
Comments:10 (10 by maintainers)

Top GitHub Comments

1reaction

mrocklincommented, Mar 19, 2020

I personally don’t know. If someone wants to look though I would recommend starting here:

https://github.com/dask/distributed/blob/2acffc3172ec32e173547ee4c39a01b6c94e74a1/distributed/scheduler.py#L5209-L5260

0reactions

jrbourbeaucommented, May 1, 2020

Indeed, thank you for following up here @guillaumeeb!

Top Results From Across the Web

What Is Accelerated Computing and Why Is It Important? - Xilinx

Adaptive computing is the only type of accelerated computing where the hardware is not permanently fixed during manufacturing. Instead, adaptive computing ...

Enabling adaptive scheduling - IBM

With adaptive scheduling, when your cluster is short on GPU resources, CPU resources can help speed up large-scale machine-learning applications and improve ...

Capacity Miss - an overview | ScienceDirect Topics

Due to the first assumption, when a task executes on a processor, if not preempted by other tasks, the only cache misses are...

Brain: an Adaptive Computer? - Department of Computer Science

The goal of this paper is to describe ways to create systems that would use their experience of past executions of algorithms in...

Cognitive state monitoring and the design of adaptive ... - NCBI

Workload adaptation is a topic that is not only relevant for specialized ... the perceived task demands of learning materials in a way...