Question: node grpc versus grpc-js performance
See original GitHub issueMy team just updated our internal tools to use grpc-js in lieu of grpc. A primary motivation for this is that we want to be able to use later versions of node and no longer use a library that’s deprecated.
After a couple weeks, we’ve identified that the node 12 + grpc
to node 14 + grpc-js
upgrade has resulted in significant performance degradation for one of our endpoints. This endpoint makes a protobuf network call for a repeated
type that returns 100+ objects. Locally we’re seeing a 3x latency increase between the two versions of the endpoint, but we haven’t debugged the issue more than just swapping the libraries so far, so there is a chance that the issue could be specific to our app.
Is this a known tradeoff between the two libraries, potentially for example that serialization takes longer with the raw JS version? Are there recommended optimizations or configurations that I might not be following that would be contributing to our problem? I would really prefer to keep grpc-js and try to optimize code rather than doing a rollback if possible.
Issue Analytics
- State:
- Created 2 years ago
- Reactions:3
- Comments:21 (9 by maintainers)
Top GitHub Comments
I spent the last day debugging this, and here are my findings so far… For background, I have a go-based GRPC server running in the cloud, and previously I was using a node-based grpc client also running in the cloud to make requests to this server.
Upgrading to grpc-js from the legacy grpc resulted in a 4-5x performance degradation on my grpc invocations. Initially I suspected data serialization to be the issue due to our large response bodies, but from running various profilers, it seems like the http2 connections are sitting in async time much longer. I confirmed that data serialization wasn’t the problem by running both the node client and the go server locally (without TLS, not sure if that will make a difference here), and each call dropped to a 100ms-ish time, which is dramatically lower than the typical call time.
Are there any connection settings that I should be looking into to improve the connection responsiveness?
Seems odd to have connections hanging so much longer than with the legacy package.
Just in case this info might be helpful, I ran my client with
GRPC_TRACE=call_stream
andGRPC_VERBOSITY=DEBUG
in both a remote server and local server scenario, and the data frames received from the remote server are much smaller:Note sure if this is a symptom of the behavior I mentioned or could help root cause some potential improvements. Thanks!
Update: current solution: If you reference my protobuf definitions above, we were adding k8s object annotations (
map<string, string> annotations
) into therepeated WeK8sGenericService services
. We included these for developer velocity purposes, so that lower-level services wouldn’t need to have extra protobuf field definitions updated every time our higher-level graphql server wanted something from the annotations.When I removed this field and replaced it with one-to-one field definitions for what we actually were pulling, the call time dropped dramatically on local testing to around 100-200ms (Note that all other protobuf calls we benchmarked for grpc-js against grpc legacy were comparable, so we were only concerned about this endpoint).
Notes: Why did this help? Normally I would think that perhaps the amount of information being returned from the annotations block was just drawfing all else. However, when I outputted the results to a file the file size didn’t seem to be proportional to the performance gained from excluding it. I’d have to run more tests to be sure though.
I also tested just using a raw string instead of a map type, and that didn’t improve performance at all.
Thus, I’m not sure exactly why this was causing performance problems for my use-case. I would also say that legacy grpc was impacted as well, since it was slow without these changes, just not as slow as our grpc-js setup.
Thus, for now, we’re happy with just reducing our message size with these changes and leaving it there, even though it’s a bit of an inconvenience to have to update field definitions in multiple places every time we want to expose something from annotations to our graphql server.
If you want me to investigate further, I could pull some hard metrics comparing with and without the annotations block for grpc legacy and grpc-js, but for now, I’ll regard this issue as resolved. Thanks for your help!