Last packets discarded in edge-triggered epoll(again?)
See original GitHub issueExpected behavior
Hi, After a little break I am having trouble again with last packets being discarded and it seriously looks like a netty bug(again?) that may be a regression after updating netty after a year of 4.1 updates, but I am unable to tell for sure.
I am using edge-triggered epoll and I am running a custom proxy application for a game (still the same one that I’ve been targeting with my netty bugfixes a while ago)
So the schema is client <-> proxy <-> server
The problem is that when the client is kicked this way:
channel.writeAndFlush(kickPacket).addListener(Listeners.CLOSE).;
There’s a 10% chance the client will not receive the packet and only connection reset. The only working workaround I was able to create is to call writeAndFlush and call channel.close() 50ms later…
I seriously have no idea what could be the problem here, but I remember fixing bugs related to this problem a year or two ago.
Another side of the problem is that recently I started seeing this problem on the “server” side so when the server sends a kick packet to the proxy with disconnection reason, the packet is not received before “channelInactive” is called.
The client<->proxy connection is autoread: true, while proxy<->server is on autoread: false to avoid possible backpressure, but both of them are creating a similar problem. The read() algorithm for the proxy<->server connection is to call channel.read() after last batch of proxied packets have been written via a ChannelPromise (I am using a dumb promise with empty packet at the end of the flushed batch, so I get a callback after everything has been written)
But even with autoread: false, the logic should be that everything should be read from socket before handling channelActive, unless I am the one calling channel.disconnect(), right?
Also, the proxy<->client connections are using a watermark like this:
.childOption(ChannelOption.WRITE_BUFFER_HIGH_WATER_MARK, 1024* 1024*5)
.childOption(ChannelOption.WRITE_BUFFER_LOW_WATER_MARK, 512*1024*2)
Any help on where should I start looking for problems(again)?
Actual behavior
Steps to reproduce
I am not sure I am able to produce a good reproducer yet before investigating more.
Minimal yet complete reproducer code (or URL to code)
Netty version
4.1.9-FINAL-Snapshot as of 4 weeks ago(9ee4cc0ada3d8e46b139671dabfbfd35c8be3308) + write squasher (source below)
JVM version (e.g. java -version
)
openjdk version “1.8.0_121” OpenJDK Runtime Environment (build 1.8.0_121-8u121-b13-0ubuntu1.16.04.2-b13) OpenJDK 64-Bit Server VM (build 25.121-b13, mixed mode)
OS version (e.g. uname -a
)
Issue Analytics
- State:
- Created 7 years ago
- Comments:11 (4 by maintainers)
Please open another bug with only the remaining problem explained
cc @Scottmitch