[tune] The actor or task cannot be scheduled right now
See original GitHub issueI have enough resources but still report a warning:
The actor or task with ID 124a2b0fc855a8f8ffffffff01000000 cannot be scheduled right now. It requires {CPU: 1.000000} for placement, but this node only has remaining {accelerator_type:P5000: 1.000000}, {node:172.31.226.37: 1.000000}, {memory: 71.142578 GiB}, {object_store_memory: 23.779297 GiB}, {GPU: 0.250000}. In total there are 7 pending tasks and 0 pending actors on this node. This is likely due to all cluster resources being claimed by actors. To resolve the issue, consider creating fewer actors or increase the resources available to this Ray cluster. You can ignore this message if this Ray cluster is expected to auto-scale.
How should i deal with this problem? Thanks.
Issue Analytics
- State:
- Created 3 years ago
- Comments:15 (7 by maintainers)
Top Results From Across the Web
Creating actors when their amount is more than `num_cpus`
Hi guys, I would like to find out the behavior of two cases. Are those correct? The first: import ray ray.init(num_cpus=2) @ray.remote(num_cpus=1) class ......
Read more >Ray: Application-level scheduling with custom resources
Application-level scheduling with custom resources. New to Ray? Start Here! Ray intends to be a universal framework for a wide range of ...
Read more >Ray Documentation
Ray Tune: Hyperparameter Optimization Framework ... We can schedule tasks on the actor by calling its methods.
Read more >How to prevent trials execution on the head - ray
tune.run(). WARNING worker.py:1047 -- The actor or task with ID ffffffffffffffff128bce290200 is pending and cannot currently be scheduled. It ...
Read more >Ray's scheduling strategy
What's the global scheduler's strategy for assigning tasks to workers? BTW, it might greatly help debug or performance tuning if Ray let each...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
I reduce the number of GPUs per trial and does not specify the number of cpu per trial. It works.
Not exactly - the 10 CPUs are reserved just for the main function of the trainable. If this main function requests more resources, you need to use the
extra_*
variables.E.g.:
This would reserve 10 CPUs and 0.25 GPUs. The
main
function will be allocated 1 CPU, and then 9 CPUs and 0.25 GPUs would be left for the main function to schedule itself.See also here: https://docs.ray.io/en/latest/tune/tutorials/overview.html#how-do-i-set-resources
Please note that in the future we will deprecate support for
extra_
arguments in favor for placement groups. This will take another couple of weeks though, so you should be safe to use it as is.