What is the proper way to submit a job to the cluster?
See original GitHub issueIf I have a local cluster, brought up with ray up
command, what is the proper way to start e.g. impala job on the cluster?
- ssh into head node and run
python my_script.py
or ray submit cluster.yaml --tmux --start --stop my_script.py
?
It is not really clear wether running a python script on any note with ray.init(redis_address) is sufficient to distribute the load across the cluster. For some reason my head node always takes all of the work, even though my cluster has 2 workers.
For example, the doc says (https://ray.readthedocs.io/en/latest/using-ray-on-a-cluster.html) that you can run code onthe cluster as follows:
import ray
ray.init(redis_address="<redis-address>")
import time
@ray.remote
def f():
time.sleep(0.01)
return ray.services.get_node_ip_address()
# Get a list of the IP addresses of the nodes that have joined the cluster.
set(ray.get([f.remote() for _ in range(1000)]))
Which works. But how do I apply this to RLLib library with impala or appo algorithms?
Issue Analytics
- State:
- Created 4 years ago
- Comments:6 (3 by maintainers)
Top Results From Across the Web
Submitting Jobs - HPC Docs
Users generally submit jobs by writing a job script file and submitting the job to Slurm with the sbatch command. The sbatch command...
Read more >Submitting and Managing Jobs Using SLURM
Jobs can be submitted to the cluster using a submit file, sometimes also called a “batch” file. The top half of the file...
Read more >Running a job on the cluster | South Dakota State University
To run on the worker nodes, we submit a batch script to the scheduler. Don't run long and resource intensive jobs on the...
Read more >Submitting a SLURM Job Script
Step 1: Resource Specification · Step 2: Variables, Paths and Modules · Step 3: Launch Application · Step 4: Submit job · Step...
Read more >Job submission - IBM
Job submission through LSF Advanced Edition maintains a single cluster view, with enhancements to support remote hosts and clusters. You can select the...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
If you’re using
rllib train
, you can pass in--redis-address
.If you’re using a python script and you want to use the entire cluster, rllib never autodetects your cluster (because you can have multiple ray clusters at a time). You append
ray.init(redis_address="<redis_address>")
to your script as @kivo360 mentioned.ray submit [script]
is just a wrapper aroundssh user@ipadress python [script]
.Hope that helps.
@edoakes should we post here references to
ray job submit
now that this is preferred way of doing things?this is a top result on google now for “ray job submit python”