Expose Ray head to public IP from within Docker
See original GitHub issueSystem information
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 18.04
- Ray installed from (source or binary): binary
- Ray version: 0.7.4
- Python version: 3.6.4
- Exact command to reproduce:
ray start --head --node-ip-address=PUBLIC_IP
I’m trying to connect to a Ray cluster which I’ve started in Kubernetes from outside the cluster. I’ve exposed the head node service externally using a LoadBalancer or NodePort configuration, and connect to it like:
ray start --address=SERVICE_ADDR:SERVICE_PORT --node-manager-port=12345 --object-manager-port=12346
(which completes with no errors)
then in Python ray.init(address="SERVICE_ADDR:SERVICE_PORT")
and get the error:
2019-11-07 19:22:27,681 WARNING services.py:174 -- Some processes that the driver needs to connect to have not registered with Redis, so retrying. Have you run 'ray start' on this node?
2019-11-07 19:22:28,687 WARNING services.py:174 -- Some processes that the driver needs to connect to have not registered with Redis, so retrying. Have you run 'ray start' on this node?2019-11-07 19:22:29,692 WARNING services.py:174 -- Some processes that the driver needs to connect to have not registered with Redis, so retrying. Have you run 'ray start' on this node?
2019-11-07 19:22:30,697 WARNING services.py:174 -- Some processes that the driver needs to connect to have not registered with Redis, so retrying. Have you run 'ray start' on this node?
2019-11-07 19:22:31,702 WARNING services.py:174 -- Some processes that the driver needs to connect to have not registered with Redis, so retrying. Have you run 'ray start' on this node?Traceback (most recent call last):
File "/home/ubuntu/.pyenv/versions/3.6.8/lib/python3.6/runpy.py", line 183, in _run_module_as_main
mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
File "/home/ubuntu/.pyenv/versions/3.6.8/lib/python3.6/runpy.py", line 109, in _get_module_details
__import__(pkg_name)
File "/home/ubuntu/projects/random_ai/rl/ppo/perf_example.py", line 73, in <module>
ray.init(address="34.220.71.121:31518", ignore_reinit_error=True)
File "/home/ubuntu/.pyenv/versions/3.6.8/lib/python3.6/site-packages/ray/worker.py", line 1547, in init
ray_params, head=False, shutdown_at_exit=False, connect_only=True)
File "/home/ubuntu/.pyenv/versions/3.6.8/lib/python3.6/site-packages/ray/node.py", line 122, in __init__
redis_password=self.redis_password)
File "/home/ubuntu/.pyenv/versions/3.6.8/lib/python3.6/site-packages/ray/services.py", line 167, in get_address_info_from_redis
redis_address, node_ip_address, redis_password=redis_password)
File "/home/ubuntu/.pyenv/versions/3.6.8/lib/python3.6/site-packages/ray/services.py", line 151, in get_address_info_from_redis_helper
"Redis has started but no raylets have registered yet.")
Exception: Redis has started but no raylets have registered yet.
I understand that this issue is due to not setting --node-ip-address
to the IP address that you’re accessing the head from. So I tried setting --node-ip-address
to the external IP of the machine that the Docker container is running on (as reported by AWS) when starting the head. But then I get this error (ray start --head --node-ip-address=PUBLIC_IP
):
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/redis/connection.py", line 539, in connect
sock = self._connect()
File "/usr/local/lib/python3.6/dist-packages/redis/connection.py", line 596, in _connect
raise err
File "/usr/local/lib/python3.6/dist-packages/redis/connection.py", line 584, in _connect
sock.connect(socket_address)
ConnectionRefusedError: [Errno 111] Connection refused
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/bin/ray", line 11, in <module>
sys.exit(main())
File "/usr/local/lib/python3.6/dist-packages/ray/scripts/scripts.py", line 787, in main
return cli()
File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 764, in __call__
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 717, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 1137, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 956, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 555, in invoke
return callback(*args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/ray/scripts/scripts.py", line 317, in start
node = ray.node.Node(ray_params, head=True, shutdown_at_exit=False)
File "/usr/local/lib/python3.6/dist-packages/ray/node.py", line 145, in __init__
self.start_head_processes()
File "/usr/local/lib/python3.6/dist-packages/ray/node.py", line 513, in start_head_processes
self.start_redis()
File "/usr/local/lib/python3.6/dist-packages/ray/node.py", line 368, in start_redis
include_java=self._ray_params.include_java)
File "/usr/local/lib/python3.6/dist-packages/ray/services.py", line 621, in start_redis
primary_redis_client.set("NumRedisShards", str(num_redis_shards))
File "/usr/local/lib/python3.6/dist-packages/redis/client.py", line 1519, in set
return self.execute_command('SET', *pieces)
File "/usr/local/lib/python3.6/dist-packages/redis/client.py", line 836, in execute_command
conn = self.connection or pool.get_connection(command_name, **options)
File "/usr/local/lib/python3.6/dist-packages/redis/connection.py", line 1073, in get_connection
connection.connect()
File "/usr/local/lib/python3.6/dist-packages/redis/connection.py", line 544, in connect
raise ConnectionError(self._error_message(e))
redis.exceptions.ConnectionError: Error 111 connecting to 34.220.71.121:6379. Connection refused.
I tried the same command from outside the Docker container and it works - so it seems to be a Docker-related issue. Any ideas on how I can get this working? It’s not clear to me why Ray should care what IP it’s being accessed at (thus removing the need for --node-ip-address
), and if it didn’t then a lot of this would be simplified.
Issue Analytics
- State:
- Created 4 years ago
- Comments:7 (3 by maintainers)
Top GitHub Comments
@edoakes I do run
ray start
before running the script (as mentioned above) - the error I’m getting happens downstream of that.The use-case is to leverage a remote cluster for compute support while doing interactive (ie Jupyter-notebook-style) development. It would be pretty sweet to be able to work on your normal development machine, in your normal development environment, and yet be able to seamlessly push heavy work off to a full cluster seamlessly.
This probably is more doable in my situation than in most - my development machine and cluster are both in AWS so network performance should be good, and code syncs through a shared file system.
Also I’ve since discovered that I can do some AWS networking magic and probably put the machines on the same local network, bypassing this issue.
Hi again! The issue will be closed because there has been no more activity in the 14 days since the last message.
Please feel free to reopen or open a new issue if you’d still like it to be addressed.
Again, you can always ask for help on our discussion forum or Ray’s public slack channel.
Thanks again for opening the issue!