Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Ray 1.2.0 fails to connect to Ray cluster on K8s (running master)

See original GitHub issue

Working through the instructions given on https://docs.ray.io/en/master/cluster/kubernetes.html, I’m running into the following issue on this step:

Then open a new shell and try out a sample program:

$ python ray/doc/kubernetes/example_scripts/run_local_example.py
The program in this example uses ray.util.connect(127.0.0.1:10001) to connect to the Ray cluster.

Traceback (most recent call last):
  File "ray/doc/kubernetes/example_scripts/run_local_example.py", line 57, in <module>
    ray.util.connect(f"127.0.0.1:{LOCAL_PORT}")
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/ray/util/client_connect.py", line 26, in connect
    conn_str, secure=secure, metadata=metadata, connection_retries=3)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/ray/util/client/__init__.py", line 57, in connect
    connection_retries=connection_retries)
  File "/home/stus/miniconda3/envs/fctk/lib/python3.7/site-packages/ray/util/client/worker.py", line 120, in __init__
    raise ConnectionError("ray client connection timeout")
ConnectionError: ray client connection timeout

I’ve confirmed that the service is available:

$  microk8s.kubectl -n ray get services
NAME                       TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                       AGE
example-cluster-ray-head   ClusterIP   10.152.183.99   <none>        10001/TCP,8265/TCP,8000/TCP   16m

And I’ve enabled port forwarding:

$  microk8s.kubectl -n ray port-forward service/example-cluster-ray-head 10001:10001
Forwarding from 127.0.0.1:10001 -> 10001
Forwarding from [::1]:10001 -> 10001
Handling connection for 10001
Handling connection for 10001

I ran this all from the current master (cd89f0dc55ae98231aa08e9a0e1c80409e75acf1).

Any help would be appreciated. Thanks.

Issue Analytics

State:
Created 3 years ago
Reactions:2
Comments:13 (6 by maintainers)

Top GitHub Comments

1reaction

tbabejcommented, Apr 1, 2021

Not a ray maintainer, but for what it’s worth the namespaces have different network interfaces, so if the service is exposed through a different namespace, port forwarding does not work. That’s pure K8s though, nothing to do with Ray AFAIK.

With respect to this issue however, I don’t think we should close it. It is certainly unintended behaviour that the older version of the client (1.2.0) cannot connect to a newer server (what will become 1.3.0) and fails silently at that. At the very least the docs need to be updated to point this out, but hopefully the server is modified to be compatible with the older client. Hence we should reopen, and maybe rephrase the title to “Ray 1.2.0 fails to connect to Ray cluster on K8s (running master)”

1reaction

ssiegel95commented, Mar 23, 2021

Any fix planned for the section that doesn’t work?

Top Results From Across the Web

Connecting to remote Ray cluster on K8s

Trying to see if I can use a Ray Actor as a cache that my ML pipeline can access (would prefer to use...

ray cluster fails to start using ray/autoscaler/local/example-full ...

I'm able to use ray start to start the master node and then have the worker node join the master manually. Ray version...

Unable to Connect to Ray Cluster from machines other than ...

stop ray on all nodes · deleted all ray temp configuration files in /tmp/. · restart the head cluster with the .yaml file...

Ray Documentation - Read the Docs

To work interactively, first start Ray on Kubernetes. ... To run tasks interactively on the cluster, connect to one of the pods, e.g.,....

KubeRay Operator - Ray.io

The KubeRay Operator automates Ray cluster lifecycle management, autoscaling, and other critical ... Use of Kubernetes PodTemplates to configure Ray pods.