question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[ray] Clustering issue

See original GitHub issue

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 16.04
  • Ray installed from (source or binary): binary
  • Ray version: 0.6.4
  • Python version: 3.6.8
  • Exact command to reproduce:

Describe the problem

I tried “manual cluster setup” on gcp instances, but always fail. I used ray start --head --redis-port=6379 command on head machine, and used import ray and ray.init(redis_address="10.129.0.7:6379") on node machine.

I attached log below It showed exception error about raylets.

I also tested ray version 0.6.3 and 0.7.0, but got the same result. There’s no communication problem to communicate each machine with redis. And all port are open.

But why cannot set up the cluster?

Source code / logs

log of head

2019-03-18 01:19:44,763	INFO scripts.py:286 -- Using IP address 10.129.0.7 for this node.
2019-03-18 01:19:44,763	INFO node.py:439 -- Process STDOUT and STDERR is being redirected to /tmp/ray/session_2019-03-18_01-19-44_3587/logs.
2019-03-18 01:19:44,866	INFO services.py:364 -- Waiting for redis server at 127.0.0.1:6379 to respond...
2019-03-18 01:19:44,975	INFO services.py:364 -- Waiting for redis server at 127.0.0.1:32675 to respond...
2019-03-18 01:19:44,976	INFO services.py:761 -- Starting Redis shard with 6.32 GB max memory.
2019-03-18 01:19:44,984	INFO services.py:1449 -- Starting the Plasma object store with 9.48 GB memory using /dev/shm.
2019-03-18 01:19:44,991	INFO scripts.py:317 --
Started Ray on this node. You can add additional nodes to the cluster by calling

    ray start --redis-address 10.129.0.7:6379

from the node you wish to add. You can connect a driver to the cluster from Python by running

    import ray
    ray.init(redis_address="10.129.0.7:6379")

If you have trouble connecting from a different machine, check that your firewall is configured properly. If you wish to terminate the processes that have been started, run

    ray stop

log of node 1

>>> import ray
>>> ray.init(redis_address="10.129.0.7:6379")
2019-03-18 01:21:15,265	WARNING worker.py:1249 -- Some processes that the driver needs to connect to have not registered with Redis, so retrying. Have you run 'ray start' on this node?
2019-03-18 01:21:16,267	WARNING worker.py:1249 -- Some processes that the driver needs to connect to have not registered with Redis, so retrying. Have you run 'ray start' on this node?
2019-03-18 01:21:17,271	WARNING worker.py:1249 -- Some processes that the driver needs to connect to have not registered with Redis, so retrying. Have you run 'ray start' on this node?
2019-03-18 01:21:18,274	WARNING worker.py:1249 -- Some processes that the driver needs to connect to have not registered with Redis, so retrying. Have you run 'ray start' on this node?
2019-03-18 01:21:19,276	WARNING worker.py:1249 -- Some processes that the driver needs to connect to have not registered with Redis, so retrying. Have you run 'ray start' on this node?
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/jason.park/.pyenv/versions/3.6.8/lib/python3.6/site-packages/ray/worker.py", line 1499, in init
    redis_address, node_ip_address, redis_password=redis_password)
  File "/home/jason.park/.pyenv/versions/3.6.8/lib/python3.6/site-packages/ray/worker.py", line 1242, in get_address_info_from_redis
    redis_address, node_ip_address, redis_password=redis_password)
  File "/home/jason.park/.pyenv/versions/3.6.8/lib/python3.6/site-packages/ray/worker.py", line 1222, in get_address_info_from_redis_helper
    "Redis has started but no raylets have registered yet.")
Exception: Redis has started but no raylets have registered yet.

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:15 (1 by maintainers)

github_iconTop GitHub Comments

13reactions
y-xuecommented, Apr 1, 2019

I had the same problem when manually setting up the cluster. For me, the problem is that I did not open enough ports for Ray. According to this comment, multiple ports need to be open.

I solve this problem by opening port 6379, 6380, 12345 and 12346 on all nodes.

On the head node:

ray start --head --redis-port=6379 --redis-shard-ports=6380 \
--node-manager-port=12345 --object-manager-port=12346

On the other nodes:

ray start --redis-address=<head-node-ip>:6379 \
--node-manager-port=12345 --object-manager-port=12346

Now I can connect a driver to the cluster on both head node and the other nodes:

ray.init(redis_address="<head-node-ip>:6379")
2reactions
rkruegs123commented, Nov 22, 2019

So you call ‘ray start’ from the head node itself?

I am having trouble connecting to my cluster via python from my local machine. I am trying to (1) start the cluster from my local machine with ray up or ray start, which is successful, then (2) ray.init(redis_address=‘<ip>:<port’).

I am confident the cluster starts because I am able to run the python script with ray submit config.yaml script.py, which I understand copies the python script to the head node. However, I imagine it is possible to connect to your cluster from your local machine and make remote cluster calls?

Has anyone else experienced this? Could the above responders kindly provide some more specifics on where they are starting the cluster from, where they are running their python scripts from, etc?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Ray Clusters Overview — Ray 2.2.0 - the Ray documentation
A Ray cluster is a set of worker nodes connected to a common Ray head node. Ray clusters can be fixed-size, or they...
Read more >
[Core][Clusters] ray start --head prints incorrect instructions for ...
EDIT: I just realized the behavior here is different on MacOS. I haven't verified if this is an issue on Linux, but there...
Read more >
Using Ray on a Large Cluster — Ray 0.01 documentation
Deploying Ray on a cluster requires a bit of manual work. ... This section can be ignored unless you run into problems with...
Read more >
An introduction to distributed computing using the Ray library ...
to get access to it but, unfortunately, there is a known issue that ... On the cloud, a Ray cluster consists of a...
Read more >
Ray in the Google cloud – part 2 - b.telligent
YAML cluster configuration. To download the YAML file for the cluster configuration, issue the following command on the client machine: wget https://raw.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found