Actor cannot be restored after killing a node.
See original GitHub issueHow to reprod:
- Start a Ray cluster with one node(called
nodeA
). - Run an actor(called
actorA
, with checkpointable) in the cluster. - Connect a new node(called
nodeB
) to the cluster. - Kill
nodeA
.
Then the actorA
cannot be restored on the nodeB
since there are no state info of actorA
in the actor_registry_
of nodeB
’s NodeManager.
A feasible scenario is to load all actor info from GCS
into actor_registry_
when a node connecting to the cluster.
Issue Analytics
- State:
- Created 4 years ago
- Reactions:3
- Comments:12 (12 by maintainers)
Top Results From Across the Web
Crash with message node --- has been marked dead because ...
Invalid return value : likely worker died or was killed while executing the task; check previous logs or dmesg for errors. Source code...
Read more >Ray Core API — Ray 2.2.0 - the Ray documentation
If you want to kill actors immediately, you can also call ray.kill(actor) . Tip. Avoid repeatedly passing in large arguments to remote task...
Read more >Respawning Actor on Player Death - Unreal Engine Forums
After the player dies, I can't seem to figure out how to respawn the actor back into the game. For example after collecting...
Read more >Cannot kill Node JS Process - Stack Overflow
I run the command: ps aux | grep node when I try to kill that process, it says – No such process. I...
Read more >Errors | Node.js v19.3.0 Documentation
emit('error', new Error('This will crash')); });. Errors generated in this way cannot be intercepted using try…catch as they are thrown after the calling...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
One way to fix this would be to have
nodeB
look upactorA
in the GCS if it cannot find an entry in its localactor_registry_
. We do this already for a similar scenario, wherenodeB
wants to submit a task toactorA
but doesn’t have its location yet (code).This is already fixed.