question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Unable to schedule actor even with available resources

See original GitHub issue

What is the problem?

Hi,

I cannot figure why i cannot use ray with any example such as the ones from the main page of tune. This seems related to #6007 but no workaround is provided.

Here is the type of error i get:

== Status ==
Memory usage on this node: 4.8/31.3 GiB
Using FIFO scheduling algorithm.
Resources requested: 1/2 CPUs, 0/1 GPUs, 0.0/16.11 GiB heap, 0.0/5.57 GiB objects
Result logdir: /home/ccastera/ray_results/train_mnist
Number of trials: 3 (1 RUNNING, 2 PENDING)
+----------------------+----------+-------+------+
| Trial name           | status   | loc   | lr   |
|----------------------+----------+-------+------|
| train_mnist_3d3297d6 | RUNNING  |       |      |
| train_mnist_3d3297d7 | PENDING  |       |      |
| train_mnist_3d3297d8 | PENDING  |       |      |
+----------------------+----------+-------+------+


2020-01-13 18:15:09,447	WARNING worker.py:1062 -- The actor or task with ID ffffffffffffffff45b95b1c0100 is pending and cannot currently be scheduled. It requires {CPU: 1.000000} for execution and {CPU: 1.000000} for placement, but this node only has remaining {node:172.22.225.108: 1.000000}, {CPU: 2.000000}, {memory: 16.113281 GiB}, {GPU: 1.000000}, {object_store_memory: 5.566406 GiB}. In total there are 0 pending tasks and 2 pending actors on this node. This is likely due to all cluster resources being claimed by actors. To resolve the issue, consider creating fewer actors or increase the resources available to this Ray cluster. You can ignore this message if this Ray cluster is expected to auto-scale.

python 3.6 ray 0.9.0.dev0 (same behavior with 0.8.0) torch 1.3.1

Reproduction

An example to reproduce the error is simply the example on MNIST:

import torch.optim as optim
from ray import tune
from ray.tune.examples.mnist_pytorch import get_data_loaders, ConvNet, train, test


def train_mnist(config):
    train_loader, test_loader = get_data_loaders()
    model = ConvNet()
    optimizer = optim.SGD(model.parameters(), lr=config["lr"])
    for i in range(10):
        train(model, optimizer, train_loader)
        acc = test(model, test_loader)
        tune.track.log(mean_accuracy=acc)


analysis = tune.run(
    train_mnist, config={"lr": tune.grid_search([0.001, 0.01, 0.1])})

print("Best config: ", analysis.get_best_config(metric="mean_accuracy"))

# Get a dataframe for analyzing trial results.
df = analysis.dataframe()

Thank you very much in advance.

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Reactions:1
  • Comments:5 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
camcasteracommented, Jan 14, 2020

I found the solution.

The issues comes from the fact that I was installing ray inside a virtual environment, which worked as you can see below.

which ray
/home/Documents/PytorchEnv/bin/ray

but even with the environment activated, Ipython was referring to the one of the system.

which ipython
/usr/local/bin/ipython

Reinstalling Ipython inside the virtual environment fixed it for me.

Thanks for your help.

0reactions
stefanbschneidercommented, Oct 14, 2020

How is this related to ipython? Are you running it from a notebook?

I have the same problem, but I’m not running RLlib from a notebook but calling it inside my code, which I run from command line.


Solved: It does seem to have sth to do with the virtual environment. I completely deleted my virtualenv and created a new one, installing everything a new, and it solved the problem. Just uninstalling and reinstalling ray within the old virtualenv didn’t help.

Not sure why.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Placement Groups — Ray 2.2.0
Placement groups are generally used for gang-scheduling actors, ... Infeasible placement groups will be pending until resources are available.
Read more >
Why You Need To Manage Your Schedule As an Actor
If acting is your full time job and your primary source of income, you need to make yourself available during business hours. Period....
Read more >
Actor lifecycle - Documentation - Akka
The Akka Actor lifecycle. ... An actor is a stateful resource that has to be explicitly started and stopped. It is important to...
Read more >
Tragedy of the Commons: What It Means in Economics
Generally, the resource of interest is easily available to all individuals ... scarce resource as possible, making the resource even harder to find.2 ......
Read more >
Multifactor Authentication - CISA
unfortunately, malicious cyber actors still have ways of getting past your ... First, it's freely available and called multifactor authentication (MFA).
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found