question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

ray.wait() doesn't return methods completed by dead actors as ready

See original GitHub issue

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 18.04
  • Ray installed from (source or binary): source
  • Ray version: 0.6.1
  • Python version: 3.6.6

Describe the problem

  1. launch an actor on another node
  2. x = actor.ping.remote()
  3. kill the node containing the actor
  4. ray.wait([x], timeout=0). x will never become ready, even if called much later

Expected behavior is that x will become ready and store an exception. This is an issue when adding heartbeats for actors on multiple node using ray.wait(), such as for distributed SGD.

Source code / logs

import time

import ray 
from ray.test.cluster_utils import Cluster

cluster = Cluster(True, True, head_node_args={"num_cpus": 0})
node = cluster.add_node()

@ray.remote(num_cpus=1)
class Foo:
    def ping(self):
        pass

f = Foo.remote()

print("pinging")
ray.get(f.ping.remote())

x = f.ping.remote()

print("removing node")
cluster.remove_node(node)
print("done removing node")

for i in range(100):
    print(i, ray.wait([x], timeout=1))
    time.sleep(1)

CC @stephanie-wang

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Reactions:1
  • Comments:15 (14 by maintainers)

github_iconTop GitHub Comments

1reaction
pschafhaltercommented, Jan 8, 2019

Yes, this only happens with the timeout which is non-blocking. If ray.wait blocks on the ObjectID, then the behavior is as expected.

Read more comments on GitHub >

github_iconTop Results From Across the Web

ray.actor — Ray 2.2.0 - the Ray documentation
# Create objects to wrap method invocations. This is done so that we can # invoke methods with actor.method.remote() instead of actor.method(). @PublicAPI ......
Read more >
How do I wait for ray on Actor class? - Stack Overflow
ray.wait returns two lists, a list of objects that are ready, and a list of objects that may or may not be ready....
Read more >
Getting started with Ray in Python! - Deepnote
We need to call ray.get() if we want the results of the function (even though this function doesn't actually do anything).
Read more >
Starting Ray - | notebook.community
Note: this approach is limited to a single machine. This can be done as follows. In [2]:. ray.init(). Waiting for redis server at...
Read more >
Ray Documentation - Read the Docs
We can schedule tasks on the actor by calling its methods. a1.increment.remote() # ray.get returns 1 a2.increment.remote() # ray.get returns ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found