question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Expose Ray head to public IP from within Docker

See original GitHub issue

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 18.04
  • Ray installed from (source or binary): binary
  • Ray version: 0.7.4
  • Python version: 3.6.4
  • Exact command to reproduce: ray start --head --node-ip-address=PUBLIC_IP

I’m trying to connect to a Ray cluster which I’ve started in Kubernetes from outside the cluster. I’ve exposed the head node service externally using a LoadBalancer or NodePort configuration, and connect to it like: ray start --address=SERVICE_ADDR:SERVICE_PORT --node-manager-port=12345 --object-manager-port=12346 (which completes with no errors) then in Python ray.init(address="SERVICE_ADDR:SERVICE_PORT") and get the error:

2019-11-07 19:22:27,681 WARNING services.py:174 -- Some processes that the driver needs to connect to have not registered with Redis, so retrying. Have you run 'ray start' on this node?
2019-11-07 19:22:28,687 WARNING services.py:174 -- Some processes that the driver needs to connect to have not registered with Redis, so retrying. Have you run 'ray start' on this node?2019-11-07 19:22:29,692 WARNING services.py:174 -- Some processes that the driver needs to connect to have not registered with Redis, so retrying. Have you run 'ray start' on this node?
2019-11-07 19:22:30,697 WARNING services.py:174 -- Some processes that the driver needs to connect to have not registered with Redis, so retrying. Have you run 'ray start' on this node?
2019-11-07 19:22:31,702 WARNING services.py:174 -- Some processes that the driver needs to connect to have not registered with Redis, so retrying. Have you run 'ray start' on this node?Traceback (most recent call last):
  File "/home/ubuntu/.pyenv/versions/3.6.8/lib/python3.6/runpy.py", line 183, in _run_module_as_main
    mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
  File "/home/ubuntu/.pyenv/versions/3.6.8/lib/python3.6/runpy.py", line 109, in _get_module_details
    __import__(pkg_name)
  File "/home/ubuntu/projects/random_ai/rl/ppo/perf_example.py", line 73, in <module>
    ray.init(address="34.220.71.121:31518", ignore_reinit_error=True)
  File "/home/ubuntu/.pyenv/versions/3.6.8/lib/python3.6/site-packages/ray/worker.py", line 1547, in init
    ray_params, head=False, shutdown_at_exit=False, connect_only=True)
  File "/home/ubuntu/.pyenv/versions/3.6.8/lib/python3.6/site-packages/ray/node.py", line 122, in __init__
    redis_password=self.redis_password)
  File "/home/ubuntu/.pyenv/versions/3.6.8/lib/python3.6/site-packages/ray/services.py", line 167, in get_address_info_from_redis
    redis_address, node_ip_address, redis_password=redis_password)
  File "/home/ubuntu/.pyenv/versions/3.6.8/lib/python3.6/site-packages/ray/services.py", line 151, in get_address_info_from_redis_helper
    "Redis has started but no raylets have registered yet.")
Exception: Redis has started but no raylets have registered yet.

I understand that this issue is due to not setting --node-ip-address to the IP address that you’re accessing the head from. So I tried setting --node-ip-address to the external IP of the machine that the Docker container is running on (as reported by AWS) when starting the head. But then I get this error (ray start --head --node-ip-address=PUBLIC_IP):

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/redis/connection.py", line 539, in connect
    sock = self._connect()
  File "/usr/local/lib/python3.6/dist-packages/redis/connection.py", line 596, in _connect
    raise err
  File "/usr/local/lib/python3.6/dist-packages/redis/connection.py", line 584, in _connect
    sock.connect(socket_address)
ConnectionRefusedError: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/bin/ray", line 11, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.6/dist-packages/ray/scripts/scripts.py", line 787, in main
    return cli()
  File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/ray/scripts/scripts.py", line 317, in start
    node = ray.node.Node(ray_params, head=True, shutdown_at_exit=False)
  File "/usr/local/lib/python3.6/dist-packages/ray/node.py", line 145, in __init__
    self.start_head_processes()
  File "/usr/local/lib/python3.6/dist-packages/ray/node.py", line 513, in start_head_processes
    self.start_redis()
  File "/usr/local/lib/python3.6/dist-packages/ray/node.py", line 368, in start_redis
    include_java=self._ray_params.include_java)
  File "/usr/local/lib/python3.6/dist-packages/ray/services.py", line 621, in start_redis
    primary_redis_client.set("NumRedisShards", str(num_redis_shards))
  File "/usr/local/lib/python3.6/dist-packages/redis/client.py", line 1519, in set
    return self.execute_command('SET', *pieces)
  File "/usr/local/lib/python3.6/dist-packages/redis/client.py", line 836, in execute_command
    conn = self.connection or pool.get_connection(command_name, **options)
  File "/usr/local/lib/python3.6/dist-packages/redis/connection.py", line 1073, in get_connection
    connection.connect()
  File "/usr/local/lib/python3.6/dist-packages/redis/connection.py", line 544, in connect
    raise ConnectionError(self._error_message(e))
redis.exceptions.ConnectionError: Error 111 connecting to 34.220.71.121:6379. Connection refused.

I tried the same command from outside the Docker container and it works - so it seems to be a Docker-related issue. Any ideas on how I can get this working? It’s not clear to me why Ray should care what IP it’s being accessed at (thus removing the need for --node-ip-address), and if it didn’t then a lot of this would be simplified.

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:7 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
zplizzicommented, Nov 12, 2019

@edoakes I do run ray start before running the script (as mentioned above) - the error I’m getting happens downstream of that.

The use-case is to leverage a remote cluster for compute support while doing interactive (ie Jupyter-notebook-style) development. It would be pretty sweet to be able to work on your normal development machine, in your normal development environment, and yet be able to seamlessly push heavy work off to a full cluster seamlessly.

This probably is more doable in my situation than in most - my development machine and cluster are both in AWS so network performance should be good, and code syncs through a shared file system.

Also I’ve since discovered that I can do some AWS networking magic and probably put the machines on the same local network, bypassing this issue.

0reactions
stale[bot]commented, Nov 25, 2020

Hi again! The issue will be closed because there has been no more activity in the 14 days since the last message.

Please feel free to reopen or open a new issue if you’d still like it to be addressed.

Again, you can always ask for help on our discussion forum or Ray’s public slack channel.

Thanks again for opening the issue!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Publish dashboard port (aka. how to provide docker options to ...
I get the dashboard connected do container (ray head) IP and specified port, but… It's not in my host network… (it's having docker...
Read more >
From inside of a Docker container, how do I connect to the ...
Use your internal IP address or connect to the special DNS name host.docker.internal which will resolve to the internal IP address used by...
Read more >
How to Get A Docker Container IP Address - freeCodeCamp
Docker provides the ability to package and run an application in a loosely isolated environment called a container.
Read more >
How to attach an AWS EBS storage volume to your Docker ...
The REX-Ray plugin can configure AWS services, such as creating volumes and attaching volumes to EC2 instances. As you can see in the...
Read more >
Amazon Elastic Container Service - Best Practices Guide
AWS X-Ray . ... container image, similar to the HEAD in a git repository. ... The host that runs your container is assigned...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found