[job submission] Support running supervisor actors in worker nodes
See original GitHub issueSearch before asking
- I had searched in the issues and found no similar feature requirement.
Description
In some real world use cases, we will deploy the dashboard in a separate node which doesn’t include a Raylet process. We do this because we want to achieve high available of dashboard. But now in job submission, we use ray.init
in dashboard to ensure the supervisor actor could be launched. The assumption is that dashboard and Raylet are collocated in one node.
So, can we use Ray Client in dashboard?
Use case
No response
Related issues
No response
Are you willing to submit a PR?
- Yes I am willing to submit a PR!
Issue Analytics
- State:
- Created a year ago
- Comments:12 (12 by maintainers)
Top Results From Across the Web
Ray Jobs Overview — Ray 2.2.0 - the Ray documentation
The Ray Jobs API allows you to submit locally developed applications to a remote Ray Cluster for execution. It simplifies the experience of...
Read more >Supervision | Akka.NET Documentation
As described in Actor Systems supervision describes a dependency relationship between actors: the supervisor delegates tasks to subordinates and therefore must ...
Read more >Supervision and Monitoring - Documentation - Akka
Depending on the nature of the work to be supervised and the nature of the ... provided actor, meant to bootstrap the application...
Read more >Process Monitoring with Supervisord - YouTube
Writeup Here: https://serversforhackers.com/video/p... As some point you'll likely find yourself writing a script which needs to run all the ...
Read more >Getting Started with Ray | Domino Data Science Blog
Indeed, there are a growing number of domain-specific libraries that work on top of Ray. ... To connect to this Ray runtime from...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Seems we still have more than half a month before June. I will lead discuss first in Ant next week and go back to sync with you.
I think this is backwards: the ray client server uses a hacky implementation to create the driver process that circumvents the standard process scheduling & runtime_env ref counting path. Instead it should do the same thing that the job submission server does: schedule a regular actor. This would unify how we do process management and environment setup across the board.
To solve the issue of
ray.init
ing to the raylet, maybe we should have the actor calls happen in the per-node agent instead of the main dashboard process?