[Optimization] Split GcsNodeInfo into Basic/Full GcsNodeInfo
See original GitHub issueSearch before asking
- I had searched in the issues and found no similar feature requirement.
Description
Recently we found that GetAllNodeInfo is a bottleneck of GCS. Because when actor/raylet FO, it will subscribe to node change, in which GetAllNodeInfo will be sent.
After analyzing, we found that only a few fields are used by CoreWorker/Raylet. We don’t need to return/publish all fields to them.
So we splited GcsNodeInfo
to BasicGcsNodeInfo
and FullGcsNodeInfo
. We’ve tested it and it has 25% performance boost.
I want to contribute this optimization, but since it will cause a lagre-scale code change(about 800 lines), I want to confirm with you guys first in this issue.
Main change
1. protobuf
before:
message GcsNodeInfo {
enum GcsNodeState {
ALIVE = 0;
DEAD = 1;
}
bytes node_id = 1;
string node_manager_address = 2;
string raylet_socket_name = 3;
string object_store_socket_name = 4;
int32 node_manager_port = 5;
int32 object_manager_port = 6;
GcsNodeState state = 7;
string node_manager_hostname = 8;
int32 metrics_export_port = 9;
double start_time = 10;
double terminate_time = 11;
int32 pid = 12;
int32 brpc_port = 13;
int64 timestamp = 14;
string shape_group = 16;
string pod_name = 17;
map<string, double> resources_total = 21;
}
after:
message BasicGcsNodeInfo {
enum GcsNodeState {
ALIVE = 0;
DEAD = 1;
}
bytes node_id = 1;
string node_manager_address = 2;
int32 node_manager_port = 5;
int32 object_manager_port = 6;
GcsNodeState state = 7;
string node_manager_hostname = 8;
int64 timestamp = 14;
}
message FullGcsNodeInfo {
BasicGcsNodeInfo basic_gcs_node_info = 22;
string raylet_socket_name = 3;
string object_store_socket_name = 4;
int32 metrics_export_port = 9;
int32 pid = 12;
int32 brpc_port = 13;
double start_time = 10;
double terminate_time = 11;
string shape_group = 16;
string pod_name = 17;
map<string, double> resources_total = 21;
}
2. GCS RPC Handler/ Accessor will have 2 versions of GetAllNodeInfo
// handlers:
void HandleGetAllBasicNodeInfo(const rpc::GetAllBasicNodeInfoRequest &request,
rpc::GetAllBasicNodeInfoReply *reply,
rpc::SendReplyCallback send_reply_callback) override;
void HandleGetAllFullNodeInfo(const rpc::GetAllFullNodeInfoRequest &request,
rpc::GetAllFullNodeInfoReply *reply,
rpc::SendReplyCallback send_reply_callback) override;
// accessors:
class BasicNodeInfoAccessor {
....
}
class FullNodeInfoAccessor {
...
}
And all related codes. Which will cause 800+ lines modification.
Some Other Topics
Why we didn’t simply keep the GcsNodeInfo as it is, and mask some fields?
First, it makes the protocol harder to understand. Then, this will introduce additional memory copy.
In current implementation, we use arena to avoid copying. But if we want to mask some fields of Reply, we need to create a new Reply first and copy a bunch of int values, string pointers to it. Which will break this optimization.
Actually this is a problem of gRPC’s Arena, it lacks a feature that allows us to mask some fields in Arena.
void GcsNodeManager::HandleGetAllNodeInfo(const rpc::GetAllNodeInfoRequest &request,
rpc::GetAllNodeInfoReply *reply,
rpc::SendReplyCallback send_reply_callback) {
// Here the unsafe allocate is safe here, because entry.second's life cycle is longer
// then reply.
// The request will be sent when call send_reply_callback and after that, reply will
// not be used any more. But entry is still valid.
for (const auto &entry : alive_nodes_) {
reply->mutable_node_info_list()->UnsafeArenaAddAllocated(entry.second.get());
}
for (const auto &entry : dead_nodes_) {
reply->mutable_node_info_list()->UnsafeArenaAddAllocated(entry.second.get());
}
GCS_RPC_SEND_REPLY(send_reply_callback, reply, Status::OK());
++counts_[CountType::GET_ALL_NODE_INFO_REQUEST];
}
Issue Analytics
- State:
- Created 2 years ago
- Comments:5 (5 by maintainers)
I have one question here, for worker, should they just ask for GetNodeInfo from raylet? If we do that could it fix this problem?
Long term goal is that worker shouldn’t talk with gcs directly. For some info it can be cached in raylet, so it’ll go with raylet, for some, raylet will redirect them to gcs.
cc @scv119 as well.
@iycheng @scv119 Yep I agree with you. Seems it’s not worth it to do this optimization after we made raylet as a proxy.
Let’s go straight forward to the raylet proxy solution. Closing this issue, thank you for your review!
@scv119 it’s CPU.