Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Unable to schedule actor even with available resources

See original GitHub issue

What is the problem?

Hi,

I cannot figure why i cannot use ray with any example such as the ones from the main page of tune. This seems related to #6007 but no workaround is provided.

Here is the type of error i get:

== Status ==
Memory usage on this node: 4.8/31.3 GiB
Using FIFO scheduling algorithm.
Resources requested: 1/2 CPUs, 0/1 GPUs, 0.0/16.11 GiB heap, 0.0/5.57 GiB objects
Result logdir: /home/ccastera/ray_results/train_mnist
Number of trials: 3 (1 RUNNING, 2 PENDING)
+----------------------+----------+-------+------+
| Trial name           | status   | loc   | lr   |
|----------------------+----------+-------+------|
| train_mnist_3d3297d6 | RUNNING  |       |      |
| train_mnist_3d3297d7 | PENDING  |       |      |
| train_mnist_3d3297d8 | PENDING  |       |      |
+----------------------+----------+-------+------+


2020-01-13 18:15:09,447	WARNING worker.py:1062 -- The actor or task with ID ffffffffffffffff45b95b1c0100 is pending and cannot currently be scheduled. It requires {CPU: 1.000000} for execution and {CPU: 1.000000} for placement, but this node only has remaining {node:172.22.225.108: 1.000000}, {CPU: 2.000000}, {memory: 16.113281 GiB}, {GPU: 1.000000}, {object_store_memory: 5.566406 GiB}. In total there are 0 pending tasks and 2 pending actors on this node. This is likely due to all cluster resources being claimed by actors. To resolve the issue, consider creating fewer actors or increase the resources available to this Ray cluster. You can ignore this message if this Ray cluster is expected to auto-scale.

python 3.6 ray 0.9.0.dev0 (same behavior with 0.8.0) torch 1.3.1

Reproduction

An example to reproduce the error is simply the example on MNIST:

import torch.optim as optim
from ray import tune
from ray.tune.examples.mnist_pytorch import get_data_loaders, ConvNet, train, test


def train_mnist(config):
    train_loader, test_loader = get_data_loaders()
    model = ConvNet()
    optimizer = optim.SGD(model.parameters(), lr=config["lr"])
    for i in range(10):
        train(model, optimizer, train_loader)
        acc = test(model, test_loader)
        tune.track.log(mean_accuracy=acc)


analysis = tune.run(
    train_mnist, config={"lr": tune.grid_search([0.001, 0.01, 0.1])})

print("Best config: ", analysis.get_best_config(metric="mean_accuracy"))

# Get a dataframe for analyzing trial results.
df = analysis.dataframe()

Thank you very much in advance.

Issue Analytics

State:
Created 4 years ago
Reactions:1
Comments:5 (2 by maintainers)

Top GitHub Comments

1reaction

camcasteracommented, Jan 14, 2020

I found the solution.

The issues comes from the fact that I was installing ray inside a virtual environment, which worked as you can see below.

which ray
/home/Documents/PytorchEnv/bin/ray

but even with the environment activated, Ipython was referring to the one of the system.

which ipython
/usr/local/bin/ipython

Reinstalling Ipython inside the virtual environment fixed it for me.

Thanks for your help.

0reactions

stefanbschneidercommented, Oct 14, 2020

How is this related to ipython? Are you running it from a notebook?

I have the same problem, but I’m not running RLlib from a notebook but calling it inside my code, which I run from command line.

Solved: It does seem to have sth to do with the virtual environment. I completely deleted my virtualenv and created a new one, installing everything a new, and it solved the problem. Just uninstalling and reinstalling ray within the old virtualenv didn’t help.

Not sure why.

Top Results From Across the Web

Placement Groups — Ray 2.2.0

Placement groups are generally used for gang-scheduling actors, ... Infeasible placement groups will be pending until resources are available.

Why You Need To Manage Your Schedule As an Actor

If acting is your full time job and your primary source of income, you need to make yourself available during business hours. Period....

Actor lifecycle - Documentation - Akka

The Akka Actor lifecycle. ... An actor is a stateful resource that has to be explicitly started and stopped. It is important to...

Tragedy of the Commons: What It Means in Economics

Generally, the resource of interest is easily available to all individuals ... scarce resource as possible, making the resource even harder to find.2 ......

Multifactor Authentication - CISA

unfortunately, malicious cyber actors still have ways of getting past your ... First, it's freely available and called multifactor authentication (MFA).