question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[tune] The actor or task cannot be scheduled right now

See original GitHub issue

I have enough resources but still report a warning:

The actor or task with ID 124a2b0fc855a8f8ffffffff01000000 cannot be scheduled right now. It requires {CPU: 1.000000} for placement, but this node only has remaining {accelerator_type:P5000: 1.000000}, {node:172.31.226.37: 1.000000}, {memory: 71.142578 GiB}, {object_store_memory: 23.779297 GiB}, {GPU: 0.250000}. In total there are 7 pending tasks and 0 pending actors on this node. This is likely due to all cluster resources being claimed by actors. To resolve the issue, consider creating fewer actors or increase the resources available to this Ray cluster. You can ignore this message if this Ray cluster is expected to auto-scale.

How should i deal with this problem? Thanks.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:15 (7 by maintainers)

github_iconTop GitHub Comments

5reactions
BBDrivecommented, Feb 4, 2021

I reduce the number of GPUs per trial and does not specify the number of cpu per trial. It works.

resources_per_trial={"gpu": 0.1}
1reaction
krfrickecommented, Feb 4, 2021

Not exactly - the 10 CPUs are reserved just for the main function of the trainable. If this main function requests more resources, you need to use the extra_* variables.

E.g.:

resources_per_trial={
    "cpu": 1,
    "extra_cpu": 9,
    "extra_gpu": 0.25
}

This would reserve 10 CPUs and 0.25 GPUs. The main function will be allocated 1 CPU, and then 9 CPUs and 0.25 GPUs would be left for the main function to schedule itself.

See also here: https://docs.ray.io/en/latest/tune/tutorials/overview.html#how-do-i-set-resources

Please note that in the future we will deprecate support for extra_ arguments in favor for placement groups. This will take another couple of weeks though, so you should be safe to use it as is.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Creating actors when their amount is more than `num_cpus`
Hi guys, I would like to find out the behavior of two cases. Are those correct? The first: import ray ray.init(num_cpus=2) @ray.remote(num_cpus=1) class ......
Read more >
Ray: Application-level scheduling with custom resources
Application-level scheduling with custom resources. New to Ray? Start Here! Ray intends to be a universal framework for a wide range of ...
Read more >
Ray Documentation
Ray Tune: Hyperparameter Optimization Framework ... We can schedule tasks on the actor by calling its methods.
Read more >
How to prevent trials execution on the head - ray
tune.run(). WARNING worker.py:1047 -- The actor or task with ID ffffffffffffffff128bce290200 is pending and cannot currently be scheduled. It ...
Read more >
Ray's scheduling strategy
What's the global scheduler's strategy for assigning tasks to workers? BTW, it might greatly help debug or performance tuning if Ray let each...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found