Connection reset by peer IOException did not fire
See original GitHub issueBackground
I have a service that use Netty as client to collect information from thousands of host continuously. I expect Netty to throw IOException regarding connection reset by peer whenever the connection has been dropped by destination prior to write. While this expected behavior is happening on local test cases, it is not the case in our large scale environment. In my situation after connection being dropped, seems like Netty attempt to write and returned successfully but the channel got close immediately which result in my application to hang there assuming that the request still is in progress. The only way that I could identify the gap was to do tcpdump and sample those request. I simplify one of those request here:
Relative Time | Source | Destination | Protocol | Info |
---|---|---|---|---|
0.011689 | 10.0.0.1 | 10.0.0.2 | TCP | [SYC] Seq=0 Len=0 |
0.012461 | 10.0.0.2 | 10.0.0.1 | TCP | [SYC, ACK] Seq=0 Ack=1 Len=0 |
0.12480 | 10.0.0.1 | 10.0.0.2 | TCP | [ACK] Seq=1 Ack=1 Len=0 |
1.659652 | 10.0.0.2 | 10.0.0.1 | TCP | [FIN, ACK] Seq=1 Ack=1 Len=0 |
1.661622 | 10.0.0.1 | 10.0.0.2 | TCP | [ACK] Seq=1 Ack=2 Len=0 |
12.484883 | 10.0.0.1 | 10.0.0.2 | TCP | [PSH, ACK] Seq=1 Ack=2 Len=14 |
12.484921 | 10.0.0.1 | 10.0.0.2 | TCP | [FIN, ACK] Seq=15 Ack=2 Len=0 |
12.485665 | 10.0.0.2 | 10.0.0.1 | TCP | [RST] Seq=2 Len=0 |
After this observation, I was able to set a listener to validate my assumption regarding channels getting closed prior to the exception being thrown out.
channel.closeFuture().addListener(new ChannelFutureListener() {
@Override
public void operationComplete(ChannelFuture future) throws Exception {
//do logic to avoid the application to hang for this request.
}
});
At this point I was able to validate that channel being closed after write while Netty returned write as successful. So I used this logic in order to develop a small test application to reproduce the issue described above.
Sample Test Case
I implemented a test scenario that reproduce the situation that happened above, but unfortunately I cannot share our actual code due to confidentiality. I am very new to Netty (~2weeks) so apologize in advance if I am completely doing it wrong here.
import io.netty.bootstrap.Bootstrap;
import io.netty.channel.Channel;
import io.netty.channel.ChannelFuture;
import io.netty.channel.ChannelFutureListener;
import io.netty.channel.ChannelHandlerContext;
import io.netty.channel.ChannelInitializer;
import io.netty.channel.SimpleChannelInboundHandler;
import io.netty.channel.nio.NioEventLoopGroup;
import io.netty.channel.socket.nio.NioSocketChannel;
import io.netty.handler.codec.string.StringDecoder;
import io.netty.handler.codec.string.StringEncoder;
import java.io.IOException;
import java.net.InetSocketAddress;
import java.net.ServerSocket;
import java.net.Socket;
import java.util.concurrent.CompletableFuture;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.ExecutionException;
import java.util.function.Consumer;
public class Test {
public static void main(String[] args) throws IOException, ExecutionException, InterruptedException {
Server server = new Server();
new Thread(server).start();
Client client = new Client();
CompletableFuture future = client.process();
server.countDownLatch.await();
client.countDownLatch.countDown();
future.get();
System.out.println("Done!");
}
public static class Client {
public CountDownLatch countDownLatch = new CountDownLatch(1);
public CompletableFuture process() {
NioEventLoopGroup eventLoopGroup = new NioEventLoopGroup();
CompletableFuture future = new CompletableFuture<>();
try {
ChannelFuture connection = new Bootstrap()
.group(eventLoopGroup)
.channel(NioSocketChannel.class)
.handler(new ChannelInitializer<Channel>() {
@Override
protected void initChannel(Channel channel) throws Exception {
channel.pipeline().addLast(new StringDecoder());
channel.pipeline().addLast(new StringEncoder());
channel.pipeline().addLast(
new SimpleChannelInboundHandler() {
@Override
protected void channelRead0(ChannelHandlerContext ctx, Object o) throws Exception {
/* process response */
future.complete("Done");
ctx.channel().close();
}
@Override
public void exceptionCaught(ChannelHandlerContext ctx, Throwable ex) {
future.completeExceptionally(ex);
ctx.channel().close();
}
}
);
}
}).connect(new InetSocketAddress("localhost", 5555));
connection.addListener((ChannelFuture channelFuture) -> {
if (channelFuture.isSuccess()) {
String message = "Message";
try {
countDownLatch.await(); /* wait to ensure connection drop by client first */
} catch (InterruptedException e) {
e.printStackTrace();
}
channelFuture.channel()
.writeAndFlush(message)
.addListener((ChannelFuture writeFuture) -> {
if (!writeFuture.isSuccess() || writeFuture.cause() != null) {
future.completeExceptionally(new Exception("Failed to write"));
}
})
//Close the connection as Netty closed after write.
.addListener(ChannelFutureListener.CLOSE);
} else {
future.completeExceptionally(new Exception("Failed to connect"));
}
});
} catch (Exception e) {
future.completeExceptionally(e);
}
return future;
}
}
public static class Server implements Runnable {
private ServerSocket serverSocket;
public CountDownLatch countDownLatch = new CountDownLatch(1);
public Server() throws IOException {
this.serverSocket = new ServerSocket(5555);
}
@Override
public void run() {
try {
Socket socket = serverSocket.accept();
Thread.sleep(500);
socket.close();
countDownLatch.countDown();
} catch (IOException | InterruptedException e) {
e.printStackTrace();
}
}
}
}
Netty version
Netty 4.1.48.Final
JVM version (e.g. java -version
)
openjdk version “11.0.11” 2021-04-20 LTS OpenJDK Runtime Environment (build 11.0.11+9-LTS) OpenJDK 64-Bit Server VM (build 11.0.11+9-LTS, mixed mode)
Issue Analytics
- State:
- Created 2 years ago
- Comments:9 (6 by maintainers)
Top GitHub Comments
Great advise @NiteshKant, I would definitely give this a try and update this thread if I learn anything new. Thank you for chiming in.
Maybe we can have separate discussion on making change to the write to not only look at local result as part of the FD and also wait for the server ACK. For me as a user, when I use TCP, I expect the library to follow the protocol completely. I’m sure there would be tradeoffs here, still trying to learn more about Netty.
If the connection is fully closed then eventually read will throw the exception that you mention. Timing of write does not matter for exception in read, just to be clear.
If you just want to listen for channel closure then you can implement
channelInactive()
method in your handler or from the channel you can listen to thecloseFuture()