question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

What is the proper way to submit a job to the cluster?

See original GitHub issue

If I have a local cluster, brought up with ray up command, what is the proper way to start e.g. impala job on the cluster?

  1. ssh into head node and run python my_script.py or
  2. ray submit cluster.yaml --tmux --start --stop my_script.py?

It is not really clear wether running a python script on any note with ray.init(redis_address) is sufficient to distribute the load across the cluster. For some reason my head node always takes all of the work, even though my cluster has 2 workers.

For example, the doc says (https://ray.readthedocs.io/en/latest/using-ray-on-a-cluster.html) that you can run code onthe cluster as follows:

import ray
ray.init(redis_address="<redis-address>")

import time

@ray.remote
def f():
    time.sleep(0.01)
    return ray.services.get_node_ip_address()

# Get a list of the IP addresses of the nodes that have joined the cluster.
set(ray.get([f.remote() for _ in range(1000)]))

Which works. But how do I apply this to RLLib library with impala or appo algorithms?

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:6 (3 by maintainers)

github_iconTop GitHub Comments

3reactions
richardliawcommented, Aug 2, 2019

If you’re using rllib train, you can pass in --redis-address.

If you’re using a python script and you want to use the entire cluster, rllib never autodetects your cluster (because you can have multiple ray clusters at a time). You append ray.init(redis_address="<redis_address>") to your script as @kivo360 mentioned.

ray submit [script] is just a wrapper around ssh user@ipadress python [script].

Hope that helps.

1reaction
worldveilcommented, Mar 2, 2022

@edoakes should we post here references to ray job submit now that this is preferred way of doing things?

this is a top result on google now for “ray job submit python”

Read more comments on GitHub >

github_iconTop Results From Across the Web

Submitting Jobs - HPC Docs
Users generally submit jobs by writing a job script file and submitting the job to Slurm with the sbatch command. The sbatch command...
Read more >
Submitting and Managing Jobs Using SLURM
Jobs can be submitted to the cluster using a submit file, sometimes also called a “batch” file. The top half of the file...
Read more >
Running a job on the cluster | South Dakota State University
To run on the worker nodes, we submit a batch script to the scheduler. Don't run long and resource intensive jobs on the...
Read more >
Submitting a SLURM Job Script
Step 1: Resource Specification · Step 2: Variables, Paths and Modules · Step 3: Launch Application · Step 4: Submit job · Step...
Read more >
Job submission - IBM
Job submission through LSF Advanced Edition maintains a single cluster view, with enhancements to support remote hosts and clusters. You can select the...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found