Unable to schedule actor even with available resources
See original GitHub issueWhat is the problem?
Hi,
I cannot figure why i cannot use ray with any example such as the ones from the main page of tune. This seems related to #6007 but no workaround is provided.
Here is the type of error i get:
== Status ==
Memory usage on this node: 4.8/31.3 GiB
Using FIFO scheduling algorithm.
Resources requested: 1/2 CPUs, 0/1 GPUs, 0.0/16.11 GiB heap, 0.0/5.57 GiB objects
Result logdir: /home/ccastera/ray_results/train_mnist
Number of trials: 3 (1 RUNNING, 2 PENDING)
+----------------------+----------+-------+------+
| Trial name | status | loc | lr |
|----------------------+----------+-------+------|
| train_mnist_3d3297d6 | RUNNING | | |
| train_mnist_3d3297d7 | PENDING | | |
| train_mnist_3d3297d8 | PENDING | | |
+----------------------+----------+-------+------+
2020-01-13 18:15:09,447 WARNING worker.py:1062 -- The actor or task with ID ffffffffffffffff45b95b1c0100 is pending and cannot currently be scheduled. It requires {CPU: 1.000000} for execution and {CPU: 1.000000} for placement, but this node only has remaining {node:172.22.225.108: 1.000000}, {CPU: 2.000000}, {memory: 16.113281 GiB}, {GPU: 1.000000}, {object_store_memory: 5.566406 GiB}. In total there are 0 pending tasks and 2 pending actors on this node. This is likely due to all cluster resources being claimed by actors. To resolve the issue, consider creating fewer actors or increase the resources available to this Ray cluster. You can ignore this message if this Ray cluster is expected to auto-scale.
python 3.6 ray 0.9.0.dev0 (same behavior with 0.8.0) torch 1.3.1
Reproduction
An example to reproduce the error is simply the example on MNIST:
import torch.optim as optim
from ray import tune
from ray.tune.examples.mnist_pytorch import get_data_loaders, ConvNet, train, test
def train_mnist(config):
train_loader, test_loader = get_data_loaders()
model = ConvNet()
optimizer = optim.SGD(model.parameters(), lr=config["lr"])
for i in range(10):
train(model, optimizer, train_loader)
acc = test(model, test_loader)
tune.track.log(mean_accuracy=acc)
analysis = tune.run(
train_mnist, config={"lr": tune.grid_search([0.001, 0.01, 0.1])})
print("Best config: ", analysis.get_best_config(metric="mean_accuracy"))
# Get a dataframe for analyzing trial results.
df = analysis.dataframe()
Thank you very much in advance.
Issue Analytics
- State:
- Created 4 years ago
- Reactions:1
- Comments:5 (2 by maintainers)
Top Results From Across the Web
Placement Groups — Ray 2.2.0
Placement groups are generally used for gang-scheduling actors, ... Infeasible placement groups will be pending until resources are available.
Read more >Why You Need To Manage Your Schedule As an Actor
If acting is your full time job and your primary source of income, you need to make yourself available during business hours. Period....
Read more >Actor lifecycle - Documentation - Akka
The Akka Actor lifecycle. ... An actor is a stateful resource that has to be explicitly started and stopped. It is important to...
Read more >Tragedy of the Commons: What It Means in Economics
Generally, the resource of interest is easily available to all individuals ... scarce resource as possible, making the resource even harder to find.2 ......
Read more >Multifactor Authentication - CISA
unfortunately, malicious cyber actors still have ways of getting past your ... First, it's freely available and called multifactor authentication (MFA).
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I found the solution.
The issues comes from the fact that I was installing ray inside a virtual environment, which worked as you can see below.
but even with the environment activated, Ipython was referring to the one of the system.
Reinstalling Ipython inside the virtual environment fixed it for me.
Thanks for your help.
How is this related to
ipython
? Are you running it from a notebook?I have the same problem, but I’m not running RLlib from a notebook but calling it inside my code, which I run from command line.
Solved: It does seem to have sth to do with the virtual environment. I completely deleted my virtualenv and created a new one, installing everything a new, and it solved the problem. Just uninstalling and reinstalling ray within the old virtualenv didn’t help.
Not sure why.