question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Ray general] Tons of irrelevant messages are sent to nodes very frequently once starting ray on cluster.

See original GitHub issue

What is your question?

When I use ray start --head --redis-port=6379 to start Ray on a head node (IP: 10.188) and then run ray start --address='192.168.10.188:6379' --redis-password='5241590000000000' on other two nodes (IP: 10.94 and 10.181), I notice something like this photo below where I captured some packets: image

My first question is why the head node (IP: 10.188) sends to 10.94 packets that include info for 10.181? And this also happens on those packets sent to 10.188 but include info for 10.94.

My second question is who and why a head node always sends this kind of message to all nodes within a very short time even though there’s no one task? What’s the meaning of these messages? q2

I guess this may be a polling mechanism, but why this need to do that so frequently?

In my test, I tried to connect 500 private nodes to one head node, and then they consumed lots of network bandwidth. As I said, tons of irrelevant messages were sent to nodes very frequently.

Any help is really appreciated.

Ray version and other system information (Python version, TensorFlow version, OS): Ray 0.8.4 Python 3.6.9

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:10 (8 by maintainers)

github_iconTop GitHub Comments

2reactions
zewenli98commented, Apr 30, 2020

COOL!!! It works! Thank u so much!

1reaction
stephanie-wangcommented, Apr 30, 2020

Yes, you can do that! It’s a bit ugly, but you just have to pass in a flag like this to the ray start command (make sure to do it on both the head and worker nodes):

ray start --internal-config="{\"raylet_heartbeat_timeout_milliseconds\":1000}"
Read more comments on GitHub >

github_iconTop Results From Across the Web

Starting Ray — Ray 2.2.0 - the Ray documentation
The ray up command uses the Ray cluster launcher to start a cluster on the cloud, creating a designated “head node” and worker...
Read more >
Ray Documentation - Read the Docs
Below, we have instructions for installing dependencies and building from source for both Linux and MacOS. Dependencies. To build Ray, first ...
Read more >
ASSESSMENT OF HIGHER EDUCATION LEARNING ... - OECD
have to be faced, and explains how these were taken into consideration. Chapter 3 then presents the general design chosen for the feasibility...
Read more >
Realtime Ray Tracing and Interactive Global Illumination
for the general ray shooting case. As checking for occlusion is a very common operation in a ray tracer, most ray tracers have...
Read more >
Recent advances and applications of machine learning in ...
As the introduction of machine learning methods to materials science is still recent, a lot of published applications are quite basic in nature ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found