Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[tune] The actor or task cannot be scheduled right now

See original GitHub issue

I have enough resources but still report a warning:

The actor or task with ID 124a2b0fc855a8f8ffffffff01000000 cannot be scheduled right now. It requires {CPU: 1.000000} for placement, but this node only has remaining {accelerator_type:P5000: 1.000000}, {node:172.31.226.37: 1.000000}, {memory: 71.142578 GiB}, {object_store_memory: 23.779297 GiB}, {GPU: 0.250000}. In total there are 7 pending tasks and 0 pending actors on this node. This is likely due to all cluster resources being claimed by actors. To resolve the issue, consider creating fewer actors or increase the resources available to this Ray cluster. You can ignore this message if this Ray cluster is expected to auto-scale.

How should i deal with this problem？ Thanks.

Issue Analytics

State:
Created 3 years ago
Comments:15 (7 by maintainers)

Top GitHub Comments

5reactions

BBDrivecommented, Feb 4, 2021

I reduce the number of GPUs per trial and does not specify the number of cpu per trial. It works.

resources_per_trial={"gpu": 0.1}

1reaction

krfrickecommented, Feb 4, 2021

Not exactly - the 10 CPUs are reserved just for the main function of the trainable. If this main function requests more resources, you need to use the extra_* variables.

E.g.:

resources_per_trial={
    "cpu": 1,
    "extra_cpu": 9,
    "extra_gpu": 0.25
}

This would reserve 10 CPUs and 0.25 GPUs. The main function will be allocated 1 CPU, and then 9 CPUs and 0.25 GPUs would be left for the main function to schedule itself.

Please note that in the future we will deprecate support for extra_ arguments in favor for placement groups. This will take another couple of weeks though, so you should be safe to use it as is.

Top Results From Across the Web

Creating actors when their amount is more than `num_cpus`

Hi guys, I would like to find out the behavior of two cases. Are those correct? The first: import ray ray.init(num_cpus=2) @ray.remote(num_cpus=1) class ......

Ray: Application-level scheduling with custom resources

Application-level scheduling with custom resources. New to Ray? Start Here! Ray intends to be a universal framework for a wide range of ...

Ray Documentation

Ray Tune: Hyperparameter Optimization Framework ... We can schedule tasks on the actor by calling its methods.

How to prevent trials execution on the head - ray

tune.run(). WARNING worker.py:1047 -- The actor or task with ID ffffffffffffffff128bce290200 is pending and cannot currently be scheduled. It ...

Ray's scheduling strategy

What's the global scheduler's strategy for assigning tasks to workers? BTW, it might greatly help debug or performance tuning if Ray let each...