question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

BDP PINGs are sent much more frequently than necessary

See original GitHub issue

What version of gRPC-Java are you using?

1.37.0

What is your environment?

5.4.0-74-generic #83-Ubuntu SMP Sat May 8 02:35:39 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

What did you expect to see?

Client does not flood server with PING frames when autoTuneFlowControl is enabled (default)

What did you see instead?

Connection closed on server with

    io.netty.handler.codec.http2.Http2Exception: Maximum number 10000 of outstanding control frames reached
	at io.netty.handler.codec.http2.Http2Exception.connectionError(Http2Exception.java:108)
	at io.netty.handler.codec.http2.Http2ControlFrameLimitEncoder.handleOutstandingControlFrames(Http2ControlFrameLimitEncoder.java:96)
	at io.netty.handler.codec.http2.Http2ControlFrameLimitEncoder.writePing(Http2ControlFrameLimitEncoder.java:69)
	at io.netty.handler.codec.http2.Http2FrameCodec.write(Http2FrameCodec.java:333)

Steps to reproduce the bug

Client makes request-response calls continuously such that there is constant number of outstanding requests.

Server is 3rd party GRPC implementation based on Netty.

It only acks received PING frames, and does not send own PING frames(ack=false). Acked frames content is 1234.

Client and server are on the same host.

Eventually (after several seconds) connection is closed by server with

    io.netty.handler.codec.http2.Http2Exception: Maximum number 10000 of outstanding control frames reached
	at io.netty.handler.codec.http2.Http2Exception.connectionError(Http2Exception.java:108)
	at io.netty.handler.codec.http2.Http2ControlFrameLimitEncoder.handleOutstandingControlFrames(Http2ControlFrameLimitEncoder.java:96)
	at io.netty.handler.codec.http2.Http2ControlFrameLimitEncoder.writePing(Http2ControlFrameLimitEncoder.java:69)
	at io.netty.handler.codec.http2.Http2FrameCodec.write(Http2FrameCodec.java:333)

There is workaround NettyChannelBuilder.flowControlWindow(int) which happens to disable autoTuneFlowControl.

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:10 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
mostroverkhovcommented, Jun 17, 2021

@voidzcy It is painful to get grpc-java into compilable state on local machine, so I composed grpc only project where this behavior is trivially reproduced as well. It is property of netty based grpc-java client, as server only does what is mandated by http2 spec - acks received PING.

With above example NettyServerHandler.onPingRead is called with frequency comparable to inbound requests, but connection is not torn down because of overly aggressive buffer flushing of grpc-java library (thats why Http2ControlFrameLimitEncoder does not kick in).

With autoTuneFlowControl enabled, It seems the rate of PINGs grows with a) - number of requests [1], [2]; b) - decrease of PING round trip time [3].

I think algorithm needs to be adjusted for high-rps low latency scenario.

0reactions
davidkilliansccommented, Aug 2, 2022

I came across this issue when doing some work to improve throughput / reduce cost. In our setup, both incoming and outgoing gRPC requests proxy through a local HTTP2 sidecar over loopback or a domain socket. As a result I’d expect the RTT to be very small and the max BDP PINGs per second to be very high.

I disabled autoTuneFlowControl for one service and saw a 3-4% reduction in the number of machines needed to handle the service’s throughput, which is a non-trivial cost reduction for us. When I compared production profiles to before the change, I observed the largest reduction in CPU was in reading and writing file descriptors (not surprising). I saw an unexpected increase in CPU spent in Native.eventFDWrite stemming from AbstractStream$TransportState.requestMessagesFromDeframer. I suspect that increase is due to the IO event loop being less likely to already being running at the moment the application thread requests messages (since the IO event loop is now doing less work as a result of reducing the PINGs).

Another observation is that most of the gain from disabling autoTuneFlowControl actually came from reduced CPU usage in the local H2 proxy (the thing responding to all those PING frames), rather than from reduced CPU usage of the grpc-java application process.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Understanding Network Latency - Scaleway's Blog
In this blog post, we attempt to demystify the topic of network latency. We take a look at the relationship between latency, bandwidth...
Read more >
How to understand serialization delay and BDP?
Serialization delay is the time that it takes to serialize a packet, meaning how long time it takes to physically put the packet...
Read more >
Impact of Bandwidth Delay Product on TCP Throughput
The sender is not allowed to send more than the Advertised Window number of bytes unless another ACK (with new Advertised Window) is...
Read more >
CCIE 400-101: Network Principles - Latency, Windowing, BDP ...
Latency is most often used to describe the delay between the time that data is requested and the time when it arrives (also...
Read more >
gRPC-Go performance Improvements
The idea is simple and powerful: every time a receiver gets a data frame it sends out a BDP ping (a ping with...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found